Deformed table restoration in scanned PDF

As one of the most common data-representation formats, the table has taken a significant place in documents because of its capability of representing the wealth of information. However, in reality, a vast of tables are published in an uneditable format such as scanned PDF or photo.

Therefore, the table reconstruction project aims to detecting the table regions and recognizing the contents from the uneditable files. Then, it would extract the valid information and recognize the table structure from the extracted table. After all, it would reconstruct an editable table by combining the obtained table contents and structure for the user’s re-creation.

The difficulties of the project are the table recognition and reconstruction from the unnormal scheme such as deformed tables, contaminated tables and tables that are photographed with different illumination. To solve the problems, we provide a bunch of deep learning methods to improve table reconstruction performance.