0

I try to flatten form fields (PDAcroForm.flatten()) in a pdf, which contain rich text. When doing so the formatting (bold, italics, color, size) get lost.

pdfs It's not edible any longer, but the formatting is also gone.

    String inputFileName = "test.pdf";

    String val = "<?xml version=\"1.0\"?>"
            + "<body xmlns=\"http://www.w3.org/1999/xhtml\">"
            +   "<p style=\"color:#FF0000;font-size:8pt;\">"
            +       "<i>Small</i> <b>Red</b>&#13;"
            +   "</p>"
            +   "<p style=\"color:#00FF00;font-size:20pt;\">"
            +       "<i>Big</i> <b>Green</b>&#13;"
            +   "</p>"
            + "</body>";
    String valNoFormat = "Small Red\rBig Green\r";

    PDDocument pdf_document = PDDocument.load(new File(inputFileName));
    PDAcroForm acroForm = pdf_document.getDocumentCatalog().getAcroForm();
    PDTextField acroField = (PDTextField)acroForm.getField("example_field_number_one");

    acroField.setValue(valNoFormat);
    acroField.setRichTextValue(val);

    acroForm.setNeedAppearances(true);
    pdf_document.save(new File("output01.pdf"));

    List<PDField> the_fields = new ArrayList<PDField>();
    for (PDField field: pdf_document.getDocumentCatalog().getAcroForm().getFieldTree()) {
        the_fields.add(field);
    }
    System.out.println("Flattening fields: " + Arrays.stream(the_fields.toArray()).map(field -> ((PDField)field).getFullyQualifiedName()).collect(Collectors.joining(", ","[","]")));
    acroForm.setNeedAppearances(true);
    pdf_document.getDocumentCatalog().getAcroForm().flatten(the_fields, true);
    pdf_document.save(new File("output02.pdf"));

Created that form elements with Adobe Acrobat Pro 10.1.1, via the form menu, and simply saved the pdfs as test.pdf.

For completeness sake I uploaded everything on github:

The question is, how can I remove the input field and flatten it while maintaining style from the content, and preferable features like auto size from the field?

luckydonald
  • 5,976
  • 4
  • 38
  • 58
  • PDFBox doesn't support rich text, so you'll get the "cheap" appearance. – Tilman Hausherr Mar 29 '19 at 13:23
  • @TilmanHausherr are there any viable workarounds and/or projects? Or any work already done where I could contribute to? – luckydonald Mar 29 '19 at 13:55
  • Not on PDFBox… you could have a look at the openhtmltopdf projject ( https://github.com/danfickle/openhtmltopdf/ ) and then try to use that code to fill appearance streams. But this won't be done in a few minutes. – Tilman Hausherr Mar 29 '19 at 14:51

1 Answers1

0

Maybe it suffices to remove the editable state:

for (PDField field: pdf_document.getDocumentCatalog().getAcroForm().getFieldTree()) {
    field.setReadOnly(true);
}
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • It seems the formatting in the field will not be displayed by all viewers, i.e. the browser built in ones. – luckydonald Mar 29 '19 at 12:24
  • E.g. in Chrome's pdf viewer, [output01.pdf](https://github.com/luckydonald-archive/pdf-stackoverflow-example/blob/cf891bb29c8be972396f5041ede4a318f4cd0bbf/output01.pdf) looks like [output02.pdf](https://github.com/luckydonald-archive/pdf-stackoverflow-example/blob/cf891bb29c8be972396f5041ede4a318f4cd0bbf/output02.pdf). No color, and no styling. – luckydonald Mar 29 '19 at 12:27
  • 1
    `Maybe it suffices to remove the editable state` - no, it doesn't. – Tilman Hausherr Mar 29 '19 at 13:29