1

I’m using PDFBox 2.0.4 to create PDF documents with acroForms. Here is my test code example:

PDDocument document = new PDDocument();
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);

PDAcroForm acroForm = new PDAcroForm(document);
document.getDocumentCatalog().setAcroForm(acroForm);

String dir = "../testPdfBox/src/main/resources/fonts/";
PDType0Font font = PDType0Font.load(document, new File(dir + "Roboto-Regular.ttf"));

PDResources resources = new PDResources();
String fontName = resources.add(font).getName();
acroForm.setDefaultResources(resources);

String defaultAppearanceString = format("/%s 12 Tf 0 g", fontName);
acroForm.setDefaultAppearance(defaultAppearanceString);

PDTextField field = new PDTextField(acroForm);
field.setPartialName("SampleField");
field.setDefaultAppearance(defaultAppearanceString);
acroForm.getFields().add(field);

PDAnnotationWidget widget = field.getWidgets().get(0);
PDRectangle rect = new PDRectangle(50, 750, 200, 50);
widget.setRectangle(rect);
widget.setPage(page);
widget.setPrinted(true);

page.getAnnotations().add(widget);

field.setValue("Sample field 123456");

acroForm.flatten();

document.save("target/SimpleForm.pdf");
document.close();

Everything works fine. But when I try to copy text from the created document and paste it to the NotePad or Word it becomes squares.

I search a lot about this problem. The most popular answer is that there is no toUnicode cmap in created PDF. So I explore my document with CanOpener for Acrobat:

enter image description here

Yes, there is no toUnicode cmap, but everything works properly, if not to use acroForm.flatten(). When form fields are not flattened, I can copy/paste text from the document and it looks correct. Nevertheless I need all fields to be flattened.

So, I have two questions:

  1. Why there is a problem with copy/pasting text in flattened form, and everything is ok in non-flattened?

  2. What can I do to avoid problem with text copy/pasting? Is there only one solution - to create toUnicode CMap by my own, like in this example?

My test pdf files are available here.

Fuhrmanator
  • 11,459
  • 6
  • 62
  • 111
Ketty K.
  • 13
  • 3
  • Please try 1) use the current version 2) change `PDType0Font.load` so that embed is set to `true`. If it still doesn't work then link to the result file. – Tilman Hausherr Jan 31 '18 at 16:51
  • 1
    It works in 2.0.4 with `PDType0Font.load` embed is set to `false`. Thank you very much! – Ketty K. Jan 31 '18 at 20:20

1 Answers1

1

Please replace

PDType0Font font = PDType0Font.load(document, new File(dir + "Roboto-Regular.ttf"));

with

PDType0Font font = PDType0Font.load(document, new FileInputStream(dir + "Roboto-Regular.ttf"), false);

This makes sure that the font is embedded in full and not just as a subset.

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97
  • The [2.0.1 documentation for `load`](https://pdfbox.apache.org/docs/2.0.1/javadocs/org/apache/pdfbox/pdmodel/font/PDType0Font.html#load(org.apache.pdfbox.pdmodel.PDDocument,%20java.io.InputStream,%20boolean)) says `embedSubset - True if the font will be subset before embedding` -- so I think your boolean should be false? – Fuhrmanator Sep 05 '18 at 20:50
  • Indeed. Corrected. Thanks! – Tilman Hausherr Sep 06 '18 at 02:55