Image Cerberus

Plug-in for the Spamassassin© anti-spam filter against image spam

Image spam is a common technique used by spammers: they embed the text message into images in a way that is easily readable by human beings but not so easy to be read by machines (and hence, spam filters) as well. Image Cerberus is designed to detect image spam. It performs a visual analysis on images attached to e-mails in order to identify if the image is likely to contain a spam message. This has been shown to be effective against image spam (see CEAS 2008).
Why using Image Cerberus Plug-in?
Several OCR-based modules against image spam are already available as SpamAssassin plug-ins:
OCR-based approaches can be effective against image spam, only when image text is clean and it is thus possible to extract the textual information from images. However, this is often difficult, given that spammers use text obfuscation techniques against OCR tools. In this case, OCR-based approaches are not able to determine if the image belongs to a legitimate e-mail or not. Moreover, they have a high computational complexity (they're not so fast!).
Visual based approaches have lower computational complexity (they are faster) than OCR systems, and they can be also effective when the text message can not be extracted from the image. On the other hand, their classification results are less reliable than OCR systems (there is not a semantic analysis of the content as in OCR-based approaches).
Eventually, we are working to design an architecture which uses both OCR and visual based approaches. Our aim is to use the complementariness between the two approaches to improve performances, classification reliability and keep the computational complexity low.
Our initial tests (carried out on a personal database of about a hundred mails, see our spam repository and CEAS 2008) show that Image Cerberus can contribute effiectively to image spam recognition. A quantitative analysis of the technique we implemented for Image Cerberus and further considerations can be found in our publications about spam filtering.
  • Download the archive files with all needed files cf and from button on the right.
  • Put the three files: "ImageCerberusPLG" "data.classifier" "data.normalizer" in a directory of your hard disk (e.g. /etc/spamassassin/imageCerberus or ~/.spamassassin/imageCerberus).
  • Then just copy and the two files into the local configuration folder of Spamassassin and remember to set in the path of ImageCerberusPLG and data files.
  • Restart Spamassassin to start working with ImageCerberus plug-in.
  • If needed, edit the configuration file to set up your custom score. Remember to restart Spamassassin after any change.
We would like to thank for contributing the plug-in development with their algorithm for text localisation Radhakrishna Achanta and Sabrine Susstrunk of Images and Visual Representation Group (IVRG) in the Audiovisual Communications Laboratory (LCAV) at Ecole Polytechnique Fédérale de Lausanne (EPFL). We would like to thank two Erasmus Spanish students too, who contributed to the plug-in development during their stay in Cagliari: Victor Cruz and Maria Carmen Montalban.
Image Cerberus needs working versions of Spamassassin, Intel OpenCV library, convert (imagemagick).
This software is released under the Apache Software License (version 2.0). Every improvement and redistribution is approved and warmly encouraged.
Image Cerberus plug-in is provided "as is" without warranty of any kind. We don't assume any responsibility on the performances and any possible damage arising out of the use of the software. Use it at your own risk!