Image spam filtering using textual and visual information

TitleImage spam filtering using textual and visual information
Publication TypeConference Paper
Year of Publication2007
AuthorsFumera, G, Pillai, I, Roli, F, Biggio, B
Conference NameMIT Spam Conference 2007
Date Published30/03/2007
Conference LocationCambridge, MA, USA
Keywordsdoc00, doc02, document categorisation, spam filtering
Abstract

In this paper we focus on the so-called "image spam", which consists in embedding the spam message into images attached to e-mails to circumvent statistical techniques based on the analysis of body text of e-mails (like the "bayesian filters"), and in applying content obscuring techniques to such images to make them unreadable by standard OCR systems without compromising human readability. We argue that a prominent role against image spam will be played by computer vision techniques, in particular visual pattern recognition and image processing techniques. We then discuss two possible approaches to defeat image spam: exploiting the high-level textual information embedded into images by combining OCR and text categorization techniques, and exploiting the low-level image information to detect content obscuring techniques applied to spam images. We also report some results of an experimental investigation on a large data set of spam e-mails, aimed at evaluating the effectiveness of combining standard OCR and text categorization techniques, and preliminary results on the use of low-level features to detect image defects (like broken or merged characters in a binarized image) which are typical consequences of content obscuring techniques that spammers are using.

Citation Key 70
Download: 
AttachmentSize
fumera_SC2007.pdf513.42 KB