Are you leaving document archiving to chance?

Excerpt: Dietrich von Seggern from callas software explains why companies should not leave the generation of PDF/A to their staff, but should prefer a centralized solution.

About the author: Dietrich von Seggern received his degree as a printing engineer, and in 1991 started his professional career as head of desktop prepress production in a reproduction house. He became involved in … Read more

October 20, 2021
by Dietrich von Seggern

Companies and public institutions benefit from PDF/A because documents can be archived permanently with this standard. While the format was originally used as a replacement for scanned paper or TIFFs in archives, it is now used primarily for digitally created documents. The format has become widely accepted and numerous software products already offer an export function to PDF/A, such as Microsoft or Libre Office. The PDF/A documents generated in this way are of good quality and meet the requirements of the standard. Nevertheless, companies should not leave the generation of PDF/A to their staff for several reasons, but should prefer a centralized solution. Read to know how.

One argument in favor of this is the fact that not all employees are aware of the “Export to PDF/A” function or it cannot be ensured that they will use it. Setting the option can easily be forgotten when there are numerous documents to be converted. In addition, the PDF/A option is not available in all creation programs. In such cases, the most direct route is often via a printer driver – with serious consequences. First of all, it is only a PDF and not a PDF/A file. Even a subsequent conversion to PDF/A cannot repair the damage.

An office printer does not need to reproduce more than all the objects visible on the page. However, digitally created files often have additional information that should be preserved during archiving. This includes metadata, such as the author’s name and tagging structures that map content characteristics such as headings or reading order.

This metadata facilitates targeted searching and identification of documents, enables their automated processing, and simplifies their association with other documents or processes. For example, they can be used to automatically index documents when they are transferred to an enterprise content management (ECM) system.

Automate the process

Based on these arguments, it is advisable to centrally automate the conversion of Office files to PDF/A and thus ensure that “clean” files are created without any loss of information. This should be done server-based, especially if the document volume is high.

There are various options for central, automatic processing. The simplest variant is based on hot folders. A hotfolder has an associated profile and several output folders. All files received in a hotfolder are automatically fetched and processed with the selected profile according to their specifications without manual intervention and then stored in the respective target folders. Modern conversion solutions have extensive functions which, for example, repair invalid PDF files or embed incomplete fonts or subsequently integrate missing fonts and correct inconsistent metadata. Appropriate reporting provides the user with information about files that caused problems during conversion, for example if a file is password-protected.

More elegant and direct automation options are available through integration with broader workflows via scripting or programming. The same range of functions is available for the conversion itself.

Another argument in favor of centralized conversion is that quality assurance can then also be performed centrally. Specifically, this involves validation, i.e. checking whether the PDF/A files that are supposedly created actually comply with the specifications of the ISO standard. In the more reliable conversion tools, validation runs automatically after each processing operation. It is therefore advisable to use tools that are compatible with the veraPDF test corpus.

Conversion and validation in one go

In principle, it is advisable to keep all documents in one format, and PDF is the first choice here as the lowest common denominator for digitally generated or paper-based documents. In order to relieve employees of the burden of conversion and at the same time ensure that all PDF/A files are of a consistently high quality, decision-makers should rely on server-based solutions that include both conversion and validation and also provide features that automate the processes surrounding the processing of PDFs.