0

We would like to automate the processing of Zugferd invoices. Is there a way to extract and save the xml files embedded in the PDF using Ghostscript?

  • 1
    No. Ghostscript doesn't do anything like that. You could probably do it with MuPDF but I'm not an expert. – KenS Mar 21 '22 at 19:50
  • 1
    Just for completeness, MuPDF can do this with "mutool run docs/examples/pdf_portfolio.js" Obviously you could look at teh JavaScript to see how it's done and potentially modify it if required. – KenS Mar 22 '22 at 07:51

1 Answers1

0

as mentioned by KenS Ghostscript can help assemble Zugferd files but not extract the contents. Below we can see those contents in the source xml (lower) and a good !? PDF where the plain text is visible (upper part of image is PDF viewed in WordPad) and can be easily extracted as text. However nothing about PDF extraction is reliable since the format of one PDF is rarely the same as the next unless you make it so.

Many PDF readers have the ability to export such attachments as the source file and many PDF libraries will allow for extraction of the named file in a scripted fashion.

enter image description here

The samples above are from currently very up to date Open Source Java application https://www.mustangproject.org/

For very simple cross platform use there is pdfdetach which can save any attachments by name or all attachments

enter image description here

K J
  • 8,045
  • 3
  • 14
  • 36
  • Hey guys! Thanks for your answers. I almost thought so. Just wanted to be on the safe side. Thanks for the tips and the reference to mustangproject. I'll take a look. – CCSoftBarth Mar 23 '22 at 07:52