I don't know abcpdf but I guess that the pdf libs offer similar access to the pdfs content.
First take a look at Das-ZUGFeRD-Format_1p0.pdf. Especially page 112. The images shows the object tree you have to traverse in order to find the xml stream.
With this tree you have the names, the types and the direction. Now you can traverse the pdf object tree to get to the XML content that you are looking for.
The steps based on the diagram.
- Read your PDF
- Get the catalog inside your PDF
- Get the Array with name
AF
from Catalog
- Get first element from
AF
array (should be file spec
)
- From
file spec
get the dictionary named EF
- Get the stream content of
EF
This are the steps you need to perform in order to get to the content.
To display the structure of a pdf and browse the tree I would recommend to use a tool like iText RUPS