Doing PDF right
Excerpt: Doing PDF right means being aware of errata for PDF 2.0. With over 65 issues now resolved and approved by the PDF Association’s PDF TWG, the body of knowledge in the “PDF Issues” GitHub repo is already significant.
About the author: Peter Wyatt is the PDF Association’s CTO and an independent technology consultant with deep file format and parsing expertise, who is a developer and researcher actively working on PDF technologies … Read more
With over 65 issues now resolved and approved by the PDF Association’s PDF Technical Working Group, the body of knowledge at “PDF Issues” is now becoming significant.
Resolved issues occur in many of the core clauses from the latest PDF specification, including Syntax (Clause 7), Graphics (Clause 8), Text (Clause 9), and Rendering (Clause 10). The simplest corrections and clarifications range from typographic and formatting errors, language clarifications (especially for those where English is a second language and the nuances of ISO phrasing may cause confusion) to fixing incorrect cross-referencing. Every error report is welcome no matter how small! Please contribute by creating new issues!
PDF issues also address more major errors such as missing (i.e., previously undocumented!) keys in dictionaries, changes to optional/required status for certain keys, corrections to permitted key values, and even providing a missing attachment to Annex L. All such changes should be of great interest to every PDF developer!
Reporting and resolving identified errors and issues in any PDF 2.0-related standard ensures that PDF continually improves as an unambiguous interoperable file format with a clear and reliable appearance model and commonly defined expected behaviors across implementations. This helps everyone in the PDF ecosystem, from PDF developers to end-users. Whether PDF is your core technology or key to a larger solution, this information is critical to ensure interoperability.
As presented, each resolution refers to one or more GitHub Issue numbers, allowing developers to review the technical discussions leading up to each resolution. This record of open discussion provides technical background and perspective into occasional debates on possible alternate suggestions. PDF-issues is fast becoming an invaluable source of education and information on a wide range of technical PDF topics.
Although the resolved issues are expressed as marked-up changes applied to the latest PDF 2.0 specification (ISO 32000-2:2020), many corrections are also highly relevant to earlier PDF specifications. This is because PDF is a backwards-compatible format and a lot of wording has been retained, or is only slightly adapted, from earlier PDF specifications. Clause numbering has largely remained unchanged between PDF 1.7 (ISO 32000-1:2008) and PDF 2.0 (ISO 32000-2:2020). PDF developers are therefore easily able to identify and map such corrections back to earlier specifications relevant to their implementations.
The PDF Association’s PDF Technical Working Group (TWG) meets regularly to review reported issues and approve those with identified proposed resolutions. PDF developers should therefore “watch” the GitHub pdf-issues repository and actively contribute to discussions on any issue that is relevant to them. It’s the single best way to inform the PDF TWG, the principal industry group which recommends proposed solutions, of all points of view. In the future, these industry-approved resolutions will be submitted to the ISO committee,
So if you are involved with PDF development and are not continually referring to the industry-approved PDF errata, you are probably doing PDF wrong!