Deriving HTML from PDF
Implementation of an algorithm that converts well-tagged pdfs into HTML.
Since 2017 we have been actively participating in PDF Association Technical Working Group with the aim to address needs of industry for changing the way PDF files are consumed on mobile devices. The main concern was whether or not the traditional fixed-layout pdf contains enough information to be safely and unambiguously interpreted as html – therefore responsive and reusable in different environments.
The output of the work is the Derivation algorithm – document that describes how the process of conversion could be done.
As a part of the work we came up with referential set of pdf documents and implementation. These should provide enough insights into the whole concept.
If you are interested in the work, please follow us on github: https://github.com/Normex/PDF-Derivation