Deriving HTML from PDF – lessons learned

Two years after introducing Deriving HTML from PDF document, two years after implementing the core concept, after processing countless authored and un-authored pdf files we will share our experiences. To successfully adopt the idea, developers need to understand the implementation challenges, authors have to change their habits in producing pdf files. We will discuss gaps in the design, in the nature of the whole process, in the lack of authoring tools and how to overcome them. The knowledge gatherer resulted in updates in PDF specification and additional work on standardisation level. We will talk briefly about it and how that will improve the reusability of pdf content in general.