PDF: The document format for everything

Excerpt: The ISO-standardized PDF format and subset formats facilitate digital document solutions for today and tomorrow.


About the author: Dietrich von Seggern received his degree as a printing engineer, and in 1991 started his professional career as head of desktop prepress production in a reproduction house. He became involved in … Read more
Article

February 16, 2021
by Dietrich von Seggern


PDF is one of the most widely used formats worldwide. Numerous companies use it to exchange information between business partners or in-house. PDF offers a broad range of features, which explains why the format is complex. For certain types of uses it is sometimes necessary to define stringent quality standards in order to guarantee interoperability. Quality requirements for certain workflows and processes are defined via PDF’s  subset formats such as PDF/X and PDF/A. These standards relieve users from the burden of technical details and help clarify responsibilities.

In order to guarantee reliable PDF processing given the diversity of PDF generation programs and the central importance of PDF in supporting line-of-business processes worldwide, PDF standards are developed by ISO-accredited committees. A frequently asked question is whether a single standard for “good quality” PDF would not be easier to deal with. The answer to this, and the reason for the (currently) 6 subset standards lies in the format’s diverse application scenarios. The aim of the various ISO-standardized subsets of PDF is to provide users with files that are functional for their specific purposes. A single standard that would, for example, combine the requirements for high-end printing with the specifications for archiving or accessibility could certainly be defined, but would entail enormous effort for creation software developers and users while meeting only a small fraction of real-world use-cases.

The current ISO standard for PDF is PDF 2.0, first published in 2017 and recently updated to ISO 32000-2:2020. ISO 32000-2 is now about 1,000 pages long and contains many detailed improvements and some important clarifications as compared to the previous 1.7 specification. A number of chapters have been completely rewritten to improve the comprehensibility and uniqueness of the specifications for the PDF constructs, which remain largely unchanged.

PDF/X for the printing industry

PDF files serve as the digital “raw material” for almost all professional printing, which includes many technical attributes that go far beyond what is needed for display on a monitor. For this reason the print industry formulated corresponding requirements shortly after adopting PDF, and developed PDF/X (the “X” stands for “eXchange”) under the ISO umbrella. The format is well-established in the printing industry and is supported by numerous software solutions. Specifications based on PDF/X, e.g. from the Ghent Workgroup or the Swiss initiative PDFX-ready, define further requirements for various printing processes and products.

The evolution of PDF/X reflects technical progress in the printing industry over time; PDF/X-4 (2008) is currently the most widely used variant, but late in 2020 PDF/X-6 was published. This new “part” (a technical term) of PDF/X is based on PDF 2.0. The most important trend in printing is automation. PDF/X-6 allows for automating diverse production settings for different pages in the same PDF. In addition, PDF/X-6 includes new parameters allowing for specifying production parameters such as black point compensation (BPC), halftone origin and spectral measurement data for spot colors (CxF), further addressing key trends in printing towards fully automated processes in prepress, print and postpress.

PDF/A for long-term archiving

PDF/A, the archival subset specification for PDF, plays an important role in records and document management. The standard was developed at the instigation of the manufacturing industry, which required a recognized, robust PDF for archiving its production documents. PDF/A was first published at the end of 2005 as ISO 19005-1. Companies and public institutions benefit from PDF/A because it helps them to make archival-quality digital documents. While the format was initially used primarily for scanned paper and as a replacement for using TIFF to capture documents for permanent archive, PDF/A is now also predominantly used for digitally generated documents. The format is widespread, especially in Europe.

Since PDF/A is an archival format it is important to recognize that all iterations of the standard, from PDF/A-1 forwards, will remain valid forever. On the other hand it makes sense for new documents to use the newest standard part, because it is based on the most recent PDF version (PDF 2.0), allowing for as many PDF features as possible, and therefore minimizing conversion needs.

Unlike earlier parts of PDF/A It has native support for PDF 2.0 features such as state of the art security features (PAdES). PDF/A-4 is divided into the main standard and two optional conformance levels addressing “niche” needs. See this article for an overview of the advances in PDF/A-4.

Conformance level PDF/A-4f supports embedding of other file formats so that digital folders containing more than one file can be conveniently represented with a single PDF. One such example is ZUGFeRD invoices, in which a machine-readable XML data record with the invoice information is embedded within a human-readable PDF/A-3 file.

Conformance level PDF/A-4e is the successor of PDF/E (engineering) and specifies how to archive PDF files with 3D models and accompanying information.

PDF/VT & PDF/VCR for variable data and transactional printing

PDF/VT (ISO 16612) is the exchange format for variable data and transactional printing, first published in 2010. It is based on PDF/X, provides an alternative to PCL, PPML, AFP, etc., and addresses new trends in printing technology with individualization and digital printing. The latest edition,PDF/VT-3, is based on PDF/X-6 and PDF 2.0:2020.

PDF/UA for accessible documents

In this ISO standard “UA” stands for universal accessibility. PDF/UA defines how texts, images, forms and other content must be created so that people with disabilities – and machines – can use them. PDF/UA helps organizations meet legal requirements for providing access to electronic information – for example in public institutions, insurance companies and banks.

Overall, PDF/UA begins with many requirements similar to those of PDF/A-2u. In addition, the ISO standard for accessible PDF includes requirements for representation of document structure; headings, paragraphs, columns, tables and alternative texts for images encoded in the “tagging structure” of a PDF file. Automatic, subsequent generation of document structure information is time-consuming, so PDF/UA generation usually begins at the original document.

PDF/UA can also be seen as the successor to the conformance level “A” from PDF/A. Of all the PDF standards discussed in this article, PDF/UA places the highest demands on generation and always requires human interaction.

Prognosis: Will PDF standards be combined in the future?

In order to keep the creation of standards-compliant PDFs as simple as possible, the PDF subset standards mentioned above will probably continue to exist separately. Happily, the current generation of PDF subset standards are written in such a way that PDF files conforming to multiple subsets are possible without great effort, provided control over the creation workflow.

The ISO committees are working to ensure that the newest versions use exactly the same formulations wherever possible, so that it now became easier for software vendors to support more than one standard. Some special knowledge is required to generate such “multi standard” PDFs with creation applications such as Microsoft Word, Adobe InDesign and others, but there are also products that enable subsequent conversion – even in automated processes.