This talk will present industry-relevant outcomes from the DARPA-funded “SafeDocs” research program, as well as reviewing future directions. This research is aimed at reducing the attack surface of parsing software, and identifying and removing inherent weaknesses in the file formats, including PDF and its many nested formats. Outcomes include:
- the “Issue Tracker” corpus comprising 1000s of stressful PDF sample files;
- the “PDF Observatory”, a cloud-based corpus analysis tool supporting instant queries across an internet-scale corpus;
- the specification-derived machine-readable “Arlington PDF Model” definition of all PDF 2.0 DOM objects, >/li>
- and a reflection of weaknesses, their exploits, and possible solutions with the PDF file format.