How to move XFA xml data into PDF/A-2 conforming File with iText/XFA Worker

Question

In the Adobe's ISO 32000 spec for PDF/A it states that XFA data can be stored in a special place in the PDF/A-2 confirming PDF. Here is the text of that section.

Incorporation of XFA Datasets into a PDF/A-2 Conforming File To support PDF/A-2 conforming files, ExtensionLevel 3 adds support for XML form data (XFA datasets) through the XFAResources name tree, which is part of the name dictionary of the document catalog.

(See “TABLE 3.28 Entries in the name dictionary” on page 23.) While Acrobat forms (and form data) are permitted in a PDF/A-2 conforming file, XML forms are not. Such XML forms are specified as XDP streams referenced from interactive form dictionaries. XDP streams can contain XFA datasets.

For applications that convert PDF documents to PDF/A-2, the XFAResources name tree supports relocation of XML form data from XDP streams in a PDF document into the XFAResources name tree.

The XFAResources name tree consists of a string name and an indirect reference to a stream. The string name is created at the time the document is converted to a PDF/A-2 conforming file. The stream contains the element of the XFA, comprised of elements.

In addition to data values for XML form fields, the elements enable the storage and retrieval of other types of information that may be useful for other workflows, including data that is not bound to form fields, and one or more XML signature(s).

See the XML Architecture, XML Forms Architecture (XFA) Specification, version 2.6 in the Bibliography

We have an XFA Form that we pass xml to and now need to convert that document to PDF/A-2.

We are currently testing out XFA Worker to see if that will allow us to do this, I have been unable to find a sample of XFA Worker that will do this for us.

I first tried to flatten with XFA Worker but that removes the data completely and is no longer able to be extracted.

How do you get the XFA xml data into the place that Adobe says to put it in with XFA Worker?

UPDATE: Thanks Bruno, my code isn't allowing me to convert the XFA Form to PDF/A-2. Here is the code I used.

    xfa.fillXfaForm(new ByteArrayInputStream(xmlSchemaStream.toByteArray()));

    stamper.close();
    reader.close();

    try (ByteArrayOutputStream outputStreamDest = new ByteArrayOutputStream()) {
        PdfReader pdfAReader = new PdfReader(output.toByteArray());

        PdfAStamper pdfAStamper = new PdfAStamper(pdfAReader, outputStreamDest, PdfAConformanceLevel.PDF_A_2A);
....

and I get an error com.itextpdf.text.pdf.PdfAConformanceException: Only PDF/A documents can be opened in PdfAStamper.

So I am now assuming the new PdfAStamper isn't a converter but just reading in the byte array of the XFA PDF.

Er... Of course `PdfAStamper` is not a converter. It's a class that allows you to stamp extra content (watermarks, page numbers, fill out forms) to an existing PDF/A document. You can't "feed" it an XFA form. `PdfAStamper` expects a PDF/A document. — Bruno Lowagie, Nov 04 '16 at 16:11
You said you were using XML Worker to convert XFA data to a PDF/A document, but now you have changed your question by saying that you use `PdfAStamper`. That is very confusing. I assumed that you were using XSLT on the XML embedding in the XFA form to convert the XFA data to HTML. I assumed that you were converting that HTML to PDF using XML Worker. Now I'm not so sure anymore. — Bruno Lowagie, Nov 04 '16 at 16:13
Sorry Bruno, I am totally new to XFA and PDF/A. The courts dictate that we use it. But I have a court XFA PDF, that I take XML generated by JAXB as a byte array, and use XMLWorker to fill in the already created Court PDF with that byte array. When that is done, I have to convert their XFA form to PDF/A in Java code, no HTML, no XSLT, pure Java. Then I need to move the XFA data that I had in JAXB XML into the Catalog. — bytor99999, Nov 04 '16 at 16:22
How do you use XML Worker to fill in the already created Court PDF? If you can do that, you know more about XML Worker than I do (and I'm the original developer of iText). — Bruno Lowagie, Nov 04 '16 at 16:27
Note: if you use `xfa.fillXfaForm(new ByteArrayInputStream(xmlSchemaStream.toByteArray()));` then you are not using XML Worker. You are using core iText functionality. Maybe you're not using XML Worker at all. In that case, please don't confuse the Stack Overflow visitor into thinking that you are. That's confusing. Bottom line: once you have filled out the form like this `xfa.fillXfaForm(new ByteArrayInputStream(xmlSchemaStream.toByteArray()));` you need XFA Worker to flatten that form. — Bruno Lowagie, Nov 04 '16 at 16:29
Thanks Bruno. I am sorry for the typo, that I meant XFA Worker and not XML Worker. The names are so similar. ;) We were using XFA Worker, so we need to flatten. Which means, that I spent two weeks going with XFA because the court had a form, and we couldn't generate XFA from Jasper Reports. To now that I know more information, which I couldn't find the past two weeks, that I should go back to the Jasper Report and just add the Catalog. Skipping XFA completely. If I understand now. — bytor99999, Nov 04 '16 at 16:35
I have no clue about what you're saying. First you claim that the US Courts demand that you use XFA, now you claim that you can do without XFA. That all sounds very strange to me. — Bruno Lowagie, Nov 04 '16 at 16:44
@BrunoLowagie The only sample we have is an XFA form the courts have as that is a fillable form that some people can use to fill out the form, then there is a button on the form, that I suspect is calling out to their LiveCycle server which is flattening, removing watermarks, removing buttons, and adding the data as xml to this Catalog entry. Up until you answered my questions do I think I understand that that must be what it is doing, as that information is not documented anywhere. So their form was XFA, so I assumed we had to in our app too. — bytor99999, Nov 04 '16 at 18:07
@BrunoLowagie It is really tough to figure out what you have to code when there is NO documentation anywhere that I could find for 2 weeks. (Not iText documentation) but what the Court has/wants and the PDF information to translate that too. I was flying blind for a good two weeks. I got different information from different places. It has been very frustrating for me because of the lack of information out there. But thanks to explaining about the catalog and that I can put it there with the dictionary and that is what the courts want, now I know the information. — bytor99999, Nov 04 '16 at 18:09
@BrunoLowagie And it keeps getting crazy because I keep getting different information from the courts, because now they are saying the xml String should be URLEncoded and put into a MetaData xmp field. I'll meet you at the bridge, because I am ready to jump. :D — bytor99999, Nov 04 '16 at 18:44
1. I think they want you to flatten the XFA using something like XFA Worker; not create a different PDF using JasperReports. 2. They want to flatten to PDF/A (which is the archiving format of choice by the US government). 3. They want the original dataset to be in the file so that they can extract it. (a.) this can be done as described using **XFAResources** (the PDF/A-2 way), (b.) this can be done in the XMP stream of the XFA file (a more generic way). — Bruno Lowagie, Nov 05 '16 at 11:18
The advantage of XMP is that the software that examines the PDF file doesn't need to be PDF aware. That software just has to look for the XMP stream, extract it, and pars the XML. The main problem is it is easy to explain how to add a dataset as URLEncoded data into the XMP stream, but as long as no one tells you which tag needs to be used inside that XMP file, no one can tell you how to do this exactly. One can guess, but the chance that we guess correctly is close to zero. Documentation from the US Courts is necessary. — Bruno Lowagie, Nov 05 '16 at 11:21
Let's do it this way: could you give me an address or a phone number of someone at the US Courts responsible for receiving these PDFs. I could ask someone at iText to contact the US Courts, and then we can write the documentation. E.g. what XMP is about, and where the IS Courts expect you to insert the data. That's better than having to guess. Now I feel it's like the blind leading the blind ;-) — Bruno Lowagie, Nov 05 '16 at 14:47
@BrunoLowagie Thanks, with your information and finally a response from the courts, I am able to create what they need. 1) I do not need XFA, thought I did since their form was an XFA form. 2) Only need a regular PDF that Jasper can generate, then add the xml to the xmp: field (Go the name well "Nickname" field they want the URL Encoded xml String in, then using iText convert it to a PDF/A. I have an email out to your sales person so we can ring up an iText license. — bytor99999, Nov 09 '16 at 21:16
Be careful: iText can **create** PDF/A files from scratch, but iText doesn't **convert** ordinary PDFs to PDF/A. Converting to PDF/A isn't always trivial if the original PDFs aren't well formed. Not every PDF with the blue bar on top saying "This PDF claims to be a PDF/A file" actually is a valid PDF/A file. I advise you not to create an ordinary PDF as an intermediary file; create a PDF/A document *from the start*. You wouldn't be the first integrator who makes the mistake delivering PDFs with a blue PDF/A bar that aren't valid PDF/A files. — Bruno Lowagie, Nov 10 '16 at 08:30
@BrunoLowagie Thanks for that comment, I will find out how to make Jasper generate the PDF as a PDF/A format. — bytor99999, Nov 17 '16 at 17:40

score 0 · Accepted Answer · edited Jan 18 '21 at 12:30

Allow me to start with some fatherly advice. XFA will be deprecated in ISO-32000-2 (PDF 2.0) and it is great that you are turning your XFA documents into PDF/A documents. However, why would you choose for PDF/A-2? PDF/A-3 is identical to PDF/A-2 with one exception: in PDF/A-3, you are allowed to embed XML files. You can even indicate the relationship between the attached XML and the PDF. Wouldn't it be smarter to create a PDF/A-3 file and to attach the original data (not the XFA file) as an attachment?

Suppose that you'd ignore this fatherly advice, what could you do?

Annex D of ISO-19005-2 (and -3) tells you that you have to add an entry to the Names dictionary of the document catalog. Unfortunately, iText 5 doesn't allow you to add your own entries to this names dictionary while creating a file, so you will have to post-process the document.

Suppose that you have a file located in filePath, then you can get the Catalog entry and the Names entry of the Catalog entry like this:

PdfReader reader = new PdfReader(filePath);
PdfDictionary catalog = reader.getCatalog();
PdfDictionary names = catalog.getAsDict(PdfName.NAMES);

You can add entries to this names dictionary. For instance: suppose that I want to add a stream with content some bytes as a custom entry, I would use this code:

public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
    PdfReader reader = new PdfReader(src);
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
    PdfDictionary catalog = reader.getCatalog();
    PdfDictionary names = catalog.getAsDict(PdfName.NAMES);
    if (names == null) {
        names = new PdfDictionary();
    }
    PdfStream stream = new PdfStream("Some bytes".getBytes());
    PdfIndirectObject objref = stamper.getWriter().addToBody(stream);
    names.put(new PdfName("ITXT_Custom"), objref.getIndirectReference());
    catalog.put(PdfName.NAMES, names);
    stamper.close();
    reader.close();
}

The result would look like this:

In your case, you don't want to entry named ITXT_Custom. You want to add an entry called XFAResources and the value of that entry should be a name tree consisting of a string name and an indirect reference to a stream. It should be fairly easy to adapt my example to achieve this.

Note: All code provided by me on Stack Overflow can be used under the CC-BY-SA as defined in the Stack Exchange Network Terms of Service. If you do not like the CC-BY-SA, I also provide this code under the same license as used for iText, more specifically the AGPL.

Thank you so much @Bruno We have a guy writing our XFA form for us, and the stress has affected my sleep on this, since it has gone past urgent on our part. We don't have a choice in the matter of the technology, it is dictated to us by who we have to submit to and you have run across another person who was in the same position here http://stackoverflow.com/questions/28304006/extract-embedded-xml-from-pdf-with-itextsharp-c. And you code above is just iText, no need for XFA Worker? Which would be just a license for iText and not iText and XFA Worker. — bytor99999, Nov 04 '16 at 14:50
If you're using XML Worker to create the PDF, then you only need the core iText without XFA Worker. Up until now, XML Worker is shipped with the core iText without an extra cost. The code to add the XFAResource doesn't require XFA Worker. — Bruno Lowagie, Nov 04 '16 at 14:54
Note that I don't understand how you would convert an XFA form to PDF/A-2 with XML Worker. I have never seen anyone do this without XFA Worker. I just assumed that you were using some XSLT to convert your specific XFA to HTML. — Bruno Lowagie, Nov 04 '16 at 16:09
Thanks Bruno. I saw that the code looked back so added it as an update. OK, so we still would need XFA Worker only to convert from XFA to PDF/A. — bytor99999, Nov 04 '16 at 16:10
XFA Worker is the software we wrote to convert filled out XFA forms to PDF (regular PDF, PDF/A,...). If you don't want to use XFA Worker, you have to parse the XFA yourself and convert it to HTML. I assumed you were doing that. That could work for simple XFA forms, but it is a huge project to do this for more complex forms. — Bruno Lowagie, Nov 04 '16 at 16:15

How to move XFA xml data into PDF/A-2 conforming File with iText/XFA Worker

1 Answers1