Announcing the Email Archiving with PDF Liaison Working Group
Excerpt: Almost every email client can save email messages as PDF files, but none do so in a manner that retains email structures or metadata proving message authenticity. The EA-PDF specification development effort will explore a better way.
About the author: The staff of the PDF Association are dedicated to delivering the information, services and value members have come to expect.
As a means of communication, email is ubiquitous (Prom, Preserving Email, 2019: 4). As a result, an email is often the only evidence of a transaction or interaction between individuals. Yet email is surprisingly easy to forge or tamper. It is therefore critical that the file formats used to represent email outside of their original systems capture and retain the metadata necessary to demonstrate trustworthiness.
Like email, PDF is ubiquitous. Unlike email, PDF is defined by an ISO standard (ISO 32000), and is employed worldwide to capture a wide variety of source document formats in a platform-independent manner. Today, almost every email client includes the ability to save email messages as a PDF file, but none do so in a manner that retains email structures or metadata proving message authenticity. Such outputs are plain “digital paper” – fixed versions of the messages lacking email’s core attributes.
There is a better way.
Email technology does not include a concept of a “native” email presentation; preservation outside the source systems implies some degree of transformation. PDF, on the other hand, is broadly adopted for presentation and preservation purposes. The format is prevalent throughout business and industry, with viewers included with the operating systems and browsers on almost all consumer devices.
While emails can be exported, stored, and preserved in something approaching their native formats (for example, PST, MBOX, or EML files), those files are typically rendered and viewed with email software. For security, the potential for confusion, and other reasons, many people are simply uncomfortable with the notion of importing others’ archived email into their own email systems. Likewise, most repository software does not natively display these formats. So long as email source data is preserved, well-considered packaging and representation of email using PDF can provide a straightforward, ubiquitous, and highly secure way to access and view archived messages, complementing preservation approaches such as those treated at length in the Future of Email Archiving Report.
While sometimes underappreciated as such, PDF is a natural target format for email preservation. Existing package structures, such as MBOX, reflect application-specific features, and content cannot be easily and reliably rendered outside of an email client environment. Domain-specific tools rely on internal databases and are not independent preservation solutions. PDF, on the other hand, is a supported file format in most existing preservation repositories and digital libraries. In addition to its familiar page rendering capability, PDF is a highly structured and documented container format supporting dozens of document-specific features and rich capabilities. PDF technology represents, effectively, a platform-independent free-form database with built-in standardized support for ISO 16684 XMP (Extensible Metadata Packaging). These qualities explain its broad appeal and implementation, as well as its suitability for packaging metadata together with visual content. Relevant archives user communities, including local, state, and federal archives, as well as museum archives, university archives, and special collection units, have requested PDF-based archiving options for email (Task Force on Technical Approaches for Email Archives: 82-83).
The EA-PDF LWG
Having obtained a grant to continue exploration of the EA-PDF concept the University of Illinois has chosen to fund the PDF Association to develop a detailed technical specification defining the interoperable use of PDF (ISO 32000), and possibly, PDF/A (ISO 19005), as an archival medium for email.
In a separate effort the University will also fund development of an open-source proof-of-concept implementation of the specification (member contributions or technology and expertise toward the proof of concept are also very welcome).
The PDF Association’s procedures for development of the specification are fully aligned with ISO’s, so that any outcome has a potential to become an ISO standard. Its Liaison Working Group model allows for PDF Association members to discuss and vote on the documents produced while allowing non-members to monitor the discussion and contribute their expertise.
Learn more about the EA-PDF LWG.
Read the University of Illinois’s own announcement.