What’s holding PDF back?

PDF has long become the de facto format for exchanging print-oriented documents on the Web, and for a good reason: it works, and reliably so! However, as the lowest common denominator it is an end-of-line format, and is under constant threat from HTML, which is reflowable, more flexible and, by definition, editable.

Enhancements to PDF have currently been limited to “bolt-on” approaches, such as tagging (for accessibility, reflow or deriving HTML) or simply embedding the original file. However, even with the original file available (and the software that created it), the robustness of the layout is lost as soon as the document is opened for editing, making such a document unsuitable for interchange among different parties.

Similarly, the editing functions of commercial PDF software from vendors such as Adobe and Foxit are only suitable for small corrections and not for repeated changes to the document.

The Editable PDF Initiative (editablepdf.org) seeks to further the concept of tagging in order to make PDF truly suitable as an interchange format with a level of editability that meets or surpasses HTML. This is achieved by two major advances: First, extending the tagging schema by defining basic visual structure and intuitive reflowing rules, and secondly, a flexible typesetting algorithm that is robust to minor font and content changes.

The ambitious goal of the project is to turn PDF into a format that is universally editable across a variety of applications in a similar way to HTML. This presentation shows how this information can be embedded in a PDF, resulting in a PDF that can still be viewed by all current software, but has additional, robust editing functionality if opened by supporting software.