Datalogics PDF Alchemist 3.1.1: A Summary of What’s New & Improved

Excerpt: The Datalogics PDF Alchemist PDF extraction strips out graphics and font formatting to deliver the text without the clutter.


About the author:
Member News

March 16, 2021
by


Datalogics Inc. recently released a new and improved version update to PDF Alchemist, our premier tool for PDF data extraction.

If you’re looking for PDF extraction results that you can import directly into databases, data analytics platforms, and spreadsheet programs like Excel, we’ve got you covered. Looking for PDF extraction that strips out graphics and font formatting to deliver the text you need without the clutter? We’ve STILL got you covered! We also made HTML management and our in-program help and error handling easier than ever before.

Here’s a summary of the updates:

New CSV Results Output

  • PDF Alchemist users can now export complex data from one application to a CSV (.csv) file, and then import the data in that CSV file into another application.
  • CSV files enable the exchange of data between applications like databases, database programs including Microsoft Excel, and contact and content management systems.

New Plain Text Output

  • PDF Alchemist can now export data to Plain Text files, which contain the data that represents identified characters of readable material (text). This output does not contain graphics, images, or document format and styling information (font color, size, bullets, or numbering). It may include “whitespace” characters that help arrange text (spaces and line breaks).
  • Plain Text files are easy to read and share because they can be opened by any text processing applications available to all human readers on any device. They will always look the same to all users and their text-only format is quick to search, copy, and paste.

Improved HTML Management

  • Our machine-friendly HTML (Hypertext Markup Language) output is now also human-friendly to ease HTML audits and management.
  • Typically, HTML powers web browser displays, and machines can easily consume and interpret a “minimized” (one continuous scrolling line of characters) output. This was the HTML output in the previous version of PDF Alchemist.
  • We’ve added line breaks and indentation to keep this output machine-friendly while also making it much easier for human readers to scan, locate, and manage their HTML output.

Improved In-program Help & Error Handling

In scenarios where incomplete and incompatible application parameters are entered, the command line interface will respond with:

  • Required and recommended (best practice) application parameters.
  • Relevant, topic-specific, help references to support new and returning users like never before.

New XSLT Stylesheet Sample

  • We’ve added a NEW XSLT sample to showcase the content-filter value of this programming language.
  • The paragraph.xslt stylesheet sample selects ONLY paragraph content from the input document. This sample could be used by any organization that consumes text heavy files and only wants to extract paragraphs of content (excludes footnotes, headers, footers, captions, etc.).

We invite you to download the latest version of PDF Alchemist to automate extraction and save yourself time, resources, and costly mistakes!