Demo of Form Processing tool

Developed by the Pattern Recognition and Applications group of the University of Cagliari

Contact person: Prof. Fabio Roli


This is a demo of the functionality of a tool for document form processing we developed during a project with the STI company.

The goal of the project was to develop a form processing system to read the text input by users into the fields of document forms (like tax payment forms), from scanned images of them.

The functionality of our tool is the following:

  1. Skew detection and correction.
  2. Localization of the fields of interest, based on prior information about the form layout and on the fields' position and layout (the position of each field can be variable),
  3. Removal of pre-printed text and lines from each field image, and reconstruction of user text.

In this demo we use the image of a clean Italian tax payment form with some fields already filled out (stamp of the post office, name of the tax payer and three "codeline" fields at the bottom of the form), and generate an artificial image representing a noisy, scanned copy of a filled out paper version of the same form.

The user can choose:

  1. The text of two typewritten fields characterized by different layouts: an account number (with an horizontal line running below the user text) and the amount of payment in Euros (one box for each digit),
  2. The skew angle,
  3. The type and amount of noise affecting the entered text, to simulate the defects of a printer,
  4. The type and amount of noise affecting the whole image, to simulate the noise introduced by a scanner.

In this demo we show the resulting artificial noisy scanned image, and the results of its processing by our tool: the same image after skew correction, the position of the localized fields, and the image of each localized field after pre-printed text and lines removal and user text reconstruction.

Enter the text of the two fields highlighted in red in the image above, and choose the horizontal and vertical shift (in pixels) of the text with respect to its correct position inside the fields.

Field

Content

Shift (in pixels)

Account number
(field on the left)
Horiz: Vert:

Amount of payment
(field on the right)

Horiz: Vert:

Choose the amount of noise affecting the user text entered above:

Defect example

Defect description and parameters

Printer noise on user text (edge degradation)

Amount: 

Choose the types and amount of the noise affecting the whole image (see the examples in the figures on the left):

Defect example

Defect description and parameters

Edge degradation (printer and/or scanner noise on all image edges: pre-printed lines and text, user text)

Amount:

Uniform noise (salt&pepper noise grouped into small clusters of noisy pixels)

Percentage of image pixels affected by the noise:

Blackening of one of the image borders

Percentage of the image width or height:
Border side:

Page border deformation (book-like)

Percentage of the image width or height:
Border side:

Skew angle (in degrees):