Form-processing tool

This application was created within a project ("Extended recognition of digitized documents - REDD") funded by STI SpA between 2007 and 2009 (partners: Signal Processing & Telecommunications Group - Numerical Image Processing, Dept. of Biomedical and Electronic Eng., University of Genoa, Italy) with the aim of developing the prototype of a system for automatic extraction and recognition of user text from scanned images of complex, real-world document forms like invoices and tax payment receipts (an example is shown in the figure below). The goal was to localize all the form fields of a document, extracting the text entered by the user and recognizing it. Our tasks in this project:

  • skew angle estimation and correction;
  • form layout recognition;
  • localization of the individual fields;
  • removal of pre-printed form graphics and text and reconstruction of user text.
Example of a tax payment form considered in this project.

The subsequent steps (text segmentation, character recognition and contextual post-processing) are in charge of the project partner DIBE.

This real-world task is particularly difficult because of image low quality due to noise introduced during form scanning, complex and non-fixed form layout and pre-printed form components, and the frequent superposition between user text and pre-printed form components.

In order to summarize the current status of our prototype we make available an on-line demo tool and a short video.