Image Cerberus: a SpamAssassin plug-in against image spam

 

Sourceforge project webpage

 

What is it?

Image Cerberus is a plug-in for the SpamAssassin© spam filter, against image spam. It was entirely developed by the P.R.A. Group (Pattern Recognition and Applications Group) of the Electric and Electronics Engineering Department (D.I.E.E.) at the University of Cagliari (Italy) and the Ambient Intelligence Laboratory of Sardegna DistrICT at Sardegna Ricerche (Italy).

Image spam is a common technique used by spammers: they embed the text message into images in a way that is easily readable by human beings but not so easy to be read by machines (and hence, spam filters) as well.

Image Cerberus is designed to detect image spam. It performs a visual analysis on images attached to emails in order to identify if the image is likely to contain a spam message. This has been shown to be effective against image spam (see CEAS 2008).

Why using Image Cerberus?

Several OCR-based modules against image spam are already available as SpamAssassin plug-ins:

OCR-based approaches can be effective against image spam, only when image text is clean and it is thus possible to extract the textual information from images. However, this is often difficult, given that spammers use text obfuscation techniques against OCR tools. In this case, OCR-based approaches are not able to determine if the image belongs to a legitimate e-mail or not. Moreover, they have a high computational complexity (they're not so fast!).

Visual based approaches have lower computational complexity (they are faster) than OCR systems, and they can be also effective when the text message can not be extracted from the image. On the other hand, their classification results are less reliable than OCR systems (there is not a semantic analysis of the content as in OCR-based approaches).

Eventually, we are working to design an architecture which uses both OCR and visual based approaches. Our aim is to use the complementariness between the two approaches to improve performances, classification reliability and keep the computational complexity low.

Our initial tests (carried out on a personal database of about a hundred mails, see our spam repository and CEAS 2008) show that Image Cerberus can contribute effiectively to image spam recognition. A quantitative analysis of the technique we implemented for Image Cerberus and further considerations can be found in our publications about spam filtering.

We need your feedback!

We are still looking for improving our plug-in. The version available here is a preliminary version. Every single contribute, in particular to the test phase, is extremely welcome, either for the Perl code or the performance tests, just feel free to volunteer! Suggestions at any level and any other contribution as a end-user are warmly welcome.

Dependencies

Image Cerberus needs working versions of Spamassassin, Intel OpenCV library, convert (imagemagick).

Installation

Download the archive files ImageCerberusPLG.zip with all needed files cf and ImageCerberusPLG.pm from here:

http://sourceforge.net/projects/imagecerberus/

Put the three files: ImageCerberusPLG data.classifier data.normalizer in a direcotry of your hard disk (e.g. /etc/spamassassin/imageCerberus or ~/.spamassassin/imageCerberus).

Then just copy ImageCerberus.cf and ImageCerberus.pm the two files into the local configuration folder of Spamassassin and remember to set in ImageCerberus.cf the path of ImageCerberusPLG and data files. Doing this you just have to restart Spamassassin to start working with ImageCerberus plug-in.

If needed, edit the configuration file ImageCerberusPLG.cf to set up your custom score. Remember to restart Spamassassin after any change.

Acknowledgements

We would like to thank for contributing the plug-in development with their algorithm for text localisation Radhakrishna Achanta and Sabrine Susstrunk of Images and Visual Representation Group (IVRG) in the Audiovisual Communications Laboratory (LCAV) at Ecole Polytechnique Fédérale de Lausanne (EPFL). We would like to thank two Erasmus Spanish students too, who contributed to the plug-in development during their stay in Cagliari: Victor Cruz and Maria Carmen Montalban.

Contact us!

At the moment Ignazio Pillai and Battista Biggio are working on this project.

License

This software is released under the Apache Software License (version 2.0). Every improvement and redistribution is approved and warmly encouraged.

Disclaimer

Image Cerberus plug-in is provided "as is" without warranty of any kind. We don't assume any responsibility on the performances and any possible damage arising out of the use of the software. Use it at your own risk!

Change-Log

2008-09-25
Added a timeout control to exrternal system call (thanks to B. Austin for the usefull suggestions).