I'm using PDFBox 2.0.1.
I try to dynamically add some (user provided) UTF8 text to the form fields and show the result to the user. Unfortunately either the pdf library is not capable of properly encoding special characters such as "äöü"... or I was not able find any useful documentation that could help me with this issue.
Can someone tell me what is wrong with the given code sample?
try (PDDocument document = PDDocument.load(pdfTemplate)) {
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDField> fields = form.getFields();
for (PDField field : fields) {
switch (field.getPartialName()) {
case "devices":
// Frontend (JS): userInput = btoa('Gerät')
String userInput = ...
String name = new String(Base64.getDecoder().decode(base64devices), "UTF-8");
field.setReadOnly(true);
break;
}
}
form.flatten(fields, true);
document.save(bos);
}
And here the stacktrace of the error:
java.lang.IllegalArgumentException: U+FFFD is not available in this font's encoding: WinAnsiEncoding
org.apache.pdfbox.pdmodel.font.PDTrueTypeFont.encode(PDTrueTypeFont.java:368)
org.apache.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:286)
org.apache.pdfbox.pdmodel.font.PDFont.getStringWidth(PDFont.java:315)
org.apache.pdfbox.pdmodel.interactive.form.PlainText$Paragraph.getLines(PlainText.java:169)
org.apache.pdfbox.pdmodel.interactive.form.PlainTextFormatter.format(PlainTextFormatter.java:182)
org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.insertGeneratedAppearance(AppearanceGeneratorHelper.java:373)
org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceContent(AppearanceGeneratorHelper.java:237)
org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(AppearanceGeneratorHelper.java:144)
org.apache.pdfbox.pdmodel.interactive.form.PDTextField.constructAppearances(PDTextField.java:263)
org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.refreshAppearances(PDAcroForm.java:324)
org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.flatten(PDAcroForm.java:213)
my.application.service.PDFService.generatePDF(PDFService.java:201)
I also found those (related) issues on SO:
pdfbox: ... is not available in this font's encoding But that does not help me choose the right encoding or how. IIRC Java uses UTF16 internally for character encoding why is the default not enough though? Is that an issue of the PDF-document itself or the code I use to set it?
PdfBox encode symbol currency euro Well its dynamic user input, so there are way to many things I would have to replace myself.
Thus, if the PDFBox people decided to fix the broken PDFBox method, this seemingly clean work-around code here would start to fail as it would then feed the fixed method broken input data.
Admittedly, I doubt they will fix this bug before 2.0.0 (and in 2.0.0 the fixed method has a different name), but one never knows...
Unfortunately I was not able to find this other setter method, but it might also be a different scope it does apply to.
EDIT
Updated example code to better represent the problem.