About AHPDFXML Conversion Library

Features

AHPDFXML is a verbose format defined by Antenna House, Inc. representing the content of a PDF in an intermediate XML structure. It is now possible to extract text, tables, and images from PDFs and convert them to an XML format which we call AHPDFXML. It is much easier to work with the PDF binary format.

The AHPDFXML Conversion Library allows you to unlock the content from legacy PDFs. This product loads PDF files, decodes data inside the PDF files, converts and outputs it into the AHPDFXML format. Using XSLT, the AHPDFXML format can then be transformed into XML, HTML5, XSL-FO, DocBook, or any other file formats.

Refer to AHPDFXML Schema Documentation for more detail.

Refer to The Difference between AHPDFXML V1 and V2 for the enhancements and corrections.

System Requirements

This product runs on the following Operating Systems:

AHPDFXML Conversion Library Operating Systems
Windows 32-bit Windows Server 2008
Windows 7
Windows 8.1
Windows 10
Windows 64-bit Windows Server 2008 x64 Edition
Windows Server 2008 R2 x64 Edition
Windows Server 2012
Windows Server 2012 R2
Windows 7 x64 Edition
Windows 8.1 x64 Edition
Windows 10 x64 Edition
Linux 64-bit version Built with GCC4.8(needs Run Time Library libstdc++.so.6)

Multiple instances of AHPDFXMLCmd.exe can run simultaneously in one system.

Supported Document Formats

The following document formats can be converted with this product:

Input Document Format: Adobe PDF 1.3 - 1.7 (extension: .pdf)
Output Document Format: AHPDFXML

Overview

  1. Creates lines and paragraphs from character data within a PDF.
  2. Creates tables by determining the table border from the line segment within a PDF.
  3. A line drawing that is not part of a table border will be converted into an SVG image.
  4. If a user password is set to a PDF, the password is required for conversion.
  5. If the PDF has security restrictions that prevent copying the content, the owner password is required for conversion.

Limitations

  1. When the character code cannot be changed to UNICODE, the characters may be outputted incorrectly.
  2. The gradation and the pattern of line drawings are not supported.
  3. Annotation data is not supported.
  4. Acroform data is not supported.
<<Back

Copyright © 2015-2017 Antenna House, Inc. All rights reserved.
Antenna House is a trademark of Antenna House, Inc.