Making standards smarter

Excerpt: As a consumer of standards, have you ever stopped and critically reviewed what a standard essentially is and how it is delivered to your desk for you to read and understand? Have you ever stopped to think about the way that standards are delivered today and whether that’s the best they can be? The theme for World Standards Day on 14 October 2021 is a “shared vision for a better world” so it is an appropriate time to reflect on what the future of standards publishing may look like… and how distan … Read more


About the author: Peter Wyatt is the PDF Association’s CTO and an independent technology consultant with deep file format and parsing expertise, who is a developer and researcher actively working on PDF technologies … Read more
Article

October 13, 2021
by Peter Wyatt


As a consumer of standards, have you ever stopped and critically reviewed what a standard essentially is and how it is delivered to your desk for you to read and understand? Have you ever stopped to think about the way that standards are delivered today and whether that’s the best they can be? The theme for World Standards Day on 14 October 2021 is a “shared vision for a better world” so it is an appropriate time to reflect on what the future of standards publishing may look like… and how distant is that future.

Standards are critically important documents for many industries. They cover a diverse range of topics from materials to management, from safety to, building and construction, aviation and automotive, and, of course, technology.

The PDF format is defined by a 1,000 page ISO standard (ISO 32000) with many specialized uses of PDF also defined as ISO standards. For example, PDF/X was the first subset of PDF to be ISO-standardized back in 2001 to support the graphic arts industry; there are now numerous parts to ISO 15930, PDF/A supports long-term preservation of digital documents, and is defined by the ISO 19005 family of standards,  PDF/UA is ISO 14289 and defines how to make digital documents accessible, while PDF/VT for variable data printing is ISO 16612, and the list goes on.

Today, standards are published by many organizations as “documents”. Historically these were published as physical paper documents, but nowadays all standards are available digitally as PDF because PDF is the best format for paginated content requiring a precise and repeatable appearance.

The content and publishing processes used by many standards bodies have moved forward slowly, and not always in full alignment with the interests of subject matter experts or consumers. Modern standards distributed as PDF are often not much more than black and white static “digital paper”, and rarely utilize the rich capabilities of the format.  Although standards may be difficult to create because of the expertise and skills required, standards can also be extremely difficult to use and consume.

Why don’t all standards published as PDFs include full navigation aids, such as nicely hierarchical bookmarks (outlines) and internal hyperlinks?

Standards are often very long, text-centric documents that must fit a standardized template that was itself created with hard-copy in mind. The mere presence of a Table of Contents, Table of Figures and Table of Tables cannot be compared to a dynamic hierarchical navigational aid for longer electronic documents. When created by capable software, “born digital” PDFs can utilize style sheet information from the application authoring environment to create fully hierarchical navigable electronic documents without altering the hardcopy appearance. Flattened bookmarks (where all headings in a document are forced to be the same level) reduce navigability, hamper those using assistive technology, and eliminates a key means by which the reader might understand the logical hierarchy of concepts and information. In addition, links to internal clauses, figures and tables should all be active hyperlinks allowing easy navigation between pages and concepts – not only making a document easier to navigate but ensuring that links target the correct location. Much of this happens ‘for free’ in modern word processors. So why isn’t aren’t these features always reflected in generated PDFs?

Screen-shot of a table of contents.

Why don’t all standards include active links to all referenced materials, especially normative and bibliographic references?

Standards commonly have two sets of references – normative references for “those documents which are cited in the text in such a way that some or all of their content constitutes requirements of the document” and bibliographic references for “those documents which are cited informatively in the document, as well as other information resources.” (ISO/IEC Directives, Part 2, Principles and rules for the structure and drafting of ISO and IEC documents)  Normative references are thus especially important for readers; ensuring that the correct document is referenced is vital to correct understanding. In the real world, variations in the way that documents are described in prose can make locating the precise version of a normative document very difficult, and a factor driving variations in the implementation of the standard. For PDF and ISO 32000-2, the PDF Association has created a dedicated resource to address this need, removing the potential for confusion and this source of malformed PDF files or parsing differentials.

Why aren’t standards made more accessible for people with disabilities, with good semantic tagging, alternate text for images, MathML for every formula, etc.? 

World Standards Day 2021 has a theme of a “shared vision for a better world” and has been established around the UN Sustainable Development Goals (SDGs) so it is very timely to consider making every standard fully accessible for those with disabilities. ISO 14829 (PDF/UA) defines specific file format and assistive technology requirements that could be applied, but simply ensuring that new PDF standards are always “tagged” (using the Tagged PDF feature defined in ISO 32000, 14.8) typically enhances accessibility.

Why don’t standards have standardized metadata? 

In many regulated industries, managing standards is a critical part of ensuring compliance. Historically done with paper documents, management processes are now typically performed in document management systems using proper controls. Nonetheless, finding a specific edition of a specific standard can be made much harder due to the lack of standardized metadata in the PDFs. A single standard can be published by multiple bodies with different reference numbers or titling conventions. Countries can also adopt international standards nationally, possibly with additional text changes or translations, creating even more variants. These nuances can all be subtle for the majority of users who are not intimately familiar with the standard. Neither the widely used BibTeX bibliographic information file format nor the Dublin Core metadata define data structures that can support the requirements for encoding standards metadata – it’s ironic indeed that standards-related metadata is itself not yet standardized!

Screen-shot of a metadata dialog.

Why don’t standards make more use of color for better communication?

Historically standards were hard copy documents that were printed on black-and-white printers, before color printers were affordable. Today most standards are accessed electronically and almost all devices have full color displays. The use of color can be very helpful to highlight important information or communicate more clearly complex concepts as it provides options beyond typeface formatting such as bold or italic, or subtle gray shadings or hatching that can vary depending on screen resolutions. Modern authoring software is also more than capable of supporting the intelligent use of color! However, the use of color must also be weighed against the need to support accessibility requirements for those who are unsighted (see above).

Why do so many standards publish formulae and special symbols as rasterized images?

Many technical standards contain specialized symbols, complex formulas, or equations. The vast majority of such symbols are supported with modern Unicode, which includes over 144,000 characters. Authors should always use the advanced equation editors in most modern word processing software, as these specialized editors will output not only more efficient vector-based representations but even MathML as well. The use of MathML is especially important as this allows those with disabilities who use assistive technology to understand equations.

Why do so many standards publish figures, illustrations and graphs as rasterized bitmaps, rather than scalable vector graphics with text?

The variation in today’s devices causes pre-rasterized figures and illustrations to sometimes appear pixelated or blurry, or be difficult to read and navigate on small screens. Scalable vector graphics are generally more efficient to store, and more amenable to being resized with pan and zoom, and provide more options for consumption to those using assistive technology to read.

What if standards included audio, movies, interactive 3D content, or other kinds of rich media content?

With the rich media capabilities of modern devices, many standards could supplement dense text with sound, video or interactive 3D content. How much more understandable would a future plumbing, electrical or mechanical engineering standard be if the normative document itself included videos, animations or 3D engineering drawings that could be exploded and then reassembled correctly? How much more understandable and comprehensible would a technical standard be if current 2D surface plots were interactive 3D plots that could be rotated and viewed from any angle? 3D PDF has provided this capability for over a decade!

What if standards included attachments such as checklists, associated data in tabular forms, related schemas, samples, examples, etc?

Many traditional standards often include numerous annexes with pages laid out as static forms or checklists, with a design-oriented on printing and hand-writing. For almost 30 years PDF has provided an interactive, rich forms experience including checkboxes, dropdown lists, etc. Tabular data is very common in standards documents, with some standards including complex tables spanning multiple pages. This ‘printed’ information is often derived from a spreadsheet or database, converted to text for the purposes of publishing as a standard, and then converted back to a spreadsheet or database by the purchaser of the standard… this is not an efficient or error-free process! A much better approach is to associate an embedded, usable attachment (such as CSV or spreadsheet file) with the table in the standard, reducing effort and the potential for errors.

Some standards are really just databases rendered as prose-based standards. For example, the ISO 3166 family of standards simply defines the list of country codes and subdivisions; representing this data as a textual document is not helpful. A better representation would be the same (or similar) PDF of the page with attached data files (for example, as the CSV, XLS and XML downloads that ISO already separately offer) where the PDF document is not much more than a title page, metadata (see above!), copyright information, and short scope statements. This “database standard” in a single PDF package is then fully manageable like all other PDF-based standards in document management systems or organizational workflows.

Some ISO standards require additional data – often packaged as ZIP archives – as separated artifacts from the actual standard itself (ISO uses the term “electronic insert” to describe such data). PDF has always supported attached or embedded files, so providing a single PDF package including both the text and all related data would be far simpler for delivery and document management for both standards organizations and their customers.

What if future standards were fully machine-readable and human-readable?

For technical standards, such as the PDF file format, the ultimate future specification would be a dual machine- and human-readable standard. Such a document would help mitigate the need for each and every developer to interpret lengthy phrasing, possibly missing many subtleties and nuances that may only be noticeable to native speakers. Developers could instead leverage unambiguous definitions that could be transformed into code or other software artifacts. ISO has started a Strategic Advisory Group on machine readable standards (MRS) but outcomes from this work are not yet published.

Many of the PDF features described above are, of course, also available via classical web technologies. However, the web requires an active internet connection and is not ideally suitable for long paginated content. A standard that spans multiple web pages is also very difficult to archive as a single authoritative publication, if spread across multiple web pages with on-demand downloadable content and complex hyperlinking. PDFs are stand-alone files that can include forms, movies, interactive 3D content, rich media content, attachments and associated data that can easily be archived or managed in existing document management systems. PDFs also support encryption and digital signatures so authenticity and provenance can also be ensured.

The business model for some SDOs involves publishing and selling standards to financially support the creation of further standards. Such models typically rely on PDF technology already, so extending the content of PDF to include richer content will not disrupt this status-quo. Other SDO business models publish their standards for free, but require participants to pay to create the standards. These free publications may be HTML, but PDF equivalent downloads are often also provided. In both cases, PDF is currently a pivotal technology so extending PDFs to include richer content will not alter either business model.

Although authoring more advanced content is more burdensome, it is fundamentally the same problem as authoring equivalently capable and accessible web content. If an image on a web page requires alternative text to be accessible, then the same alternative text is used in PDF. If tabular data on a website requires ARIA attributes to be accessible, then that same accessibility information can be used in the PDF.  All modern office suites can automatically create these more advanced PDFs such as adding tags for accessibility, bookmark navigation aids based on heading levels, insertion of alternative text on images, and richer PDF metadata directly from the application-specific metadata.  PDF utilizes the same publishing workflows as web technologies so there is no additional burden.

The PDF Association is uniquely positioned to influence the future of smart standards:

  • The PDF Association’s mission is to “promote Open Standards-based electronic document implementations using PDF technology through education, expertise and shared experience for stakeholders worldwide” so we are strongly invested in ensuring that open standards benefit all users of PDF across the globe;
  • As the Secretariat to ISO TC 171 SC 2 we are continually engaging with ISO in the creation of PDF-based standards;
  • As a Category A Liaison to ISO TC 171 SC 2 our members are active participants in the technical development and creation processes of PDF and related standards;
  • As experts in PDF, we are constantly needing to stay abreast and refer to other dependent standards from ISO, IETF and other standards bodies. We are a consumer of standards as much as a producer of standards!
  • And as the leading industry and technical body for the PDF industry, we know more about PDF and what PDF can deliver than anyone else.

The PDF Association can assist standards organizations significantly improve their products (“standards”) by updating and improving their existing PDF publication processes, without lengthy or disruptive costly digital transformation initiatives. So on World Standards Day 2021, and as both consumers and producers of standards, let’s have a discussion about a “shared vision for a better world” with better standards documents.