The Arlington PDF Model – Around the Blocks

This talk will present the Arlington PDF Model as the first open access, vendor-neutral, comprehensive, specification-derived machine-readable definition of all formally defined PDF objects and their intra- and inter-object relationships. This represents the bulk of the latest 1,000-page ISO PDF 2.0 specification in a machine-readable text-based definition of the entire PDF DOM. It establishes a state of the art “ground truth” for future PDF research efforts and implementers. Using either trivial Linux commands, or simple scripts, or more advanced programs a multitude of potential use-cases are supported, including test case generation, extant data validation, parser generation, modelling and rapid forensic analysis of PDF syntax fragments.