Packaging email archives using PDF

Excerpt: EA-PDF establishes high-level requirements for using PDF technology to package email for long-term preservation.


About the author: The staff of the PDF Association are dedicated to delivering the information, services and value members have come to expect.
Article

February 11, 2021
by PDF Association staff


Archiving email isn’t easy or obvious. Commonly, solutions are vendor-specific and email clients are required; not an ideal solution for static records.

In 2019 the University of Illinois was awarded a grant by the Andrew W. Mellon Foundation to develop conversion criteria and requirements for archiving email into PDF containers.  The final report, “A Specification for Using PDF to Package and Represent Email”, is now available from the University of Illinois IDEALS Repository.

The report detailing the EA-PDF concept establishes high-level functional requirements for using ISO 32000 (PDF) technology as a model for packaging email for long-term preservation purposes. These requirements detail desirable functionality reflecting considerable input from stakeholders in digital preservation, government, education and industry communities.

PDF’s ubiquity and acceptance, rich capabilities and open, well-documented specification is already supported by a global ecosystem of developers. PDF facilitates redaction, includes advanced digital signature technology, XMP metadata, semantic tagging, associated files, rich media, 3D and many other technologies that make it highly effective for many digital content archiving applications. PDF’s magic lies in its reliability and interoperability, so facilitating interoperability when using (and truly leveraging) PDF specifically for email archives turns out to be a reasonable application of the technology.

Conceptually, EA-PDF is no more complex than the underlying source email, but represents that complexity in a formally-defined manner, within the structures of the PDF container. MBOX, EML, and other formats are less well-defined formats than families of formats defined more by client implementations than by authoritative specifications. PDF provides a means to represent these implementations in a normalized packaging model, regardless of the underlying source.

The EA-PDF concept integrates the capture of EML or MBOX content with PDF as a packaging, representation and distribution model for individual emails up to complete mailboxes. Leveling the email archiving problem into the EA-PDF framework makes “A Specification for Using PDF to Package and Represent Email” a thought-provoking take on leveraging the unique power of PDF to cut this Gordian knot.