Spam filtering: prototypes

Image Spam Lab

To experiment computer vision and pattern recognition techniques against image spam, we developed a tool for generating artificial spam images. The tool allows to generate images with embedded text obfuscated with several techniques used by spammers in real spam e-mails. The user can enter any text he wants (or use some predefined texts), and can choose the text features (font face and size), width and height of the image, the obfuscation technique and the obfuscation level. Three obfuscation techniques have been implemented so far, as in the following examples.

 

Real Image Spam Artificial Image Spam
Real Image Spam Artificial Image Spam
Real Image Spam Artificial Image Spam
   Fig.1: Three examples of real (on the left) and artificial (on the right) image spam, one for each different obfuscation technique.

 

The obfuscation level can be tuned to obtain clean images to obfuscated images which cannot be read by OCR tools, but are still readable by a human being, as in the examples below.

 

  

  

  

Fig.2:  Examples of the three obfuscation techniques for 50% obfuscation level (left) and 100% obfuscation level (right), applied to the original clean image (top image).

 

A screen shot of our Image Spam Lab tool is shown below.

 

Fig.3: A screen shot of Image Spam Lab.