0

I am using PDFBox 1.8.10 to load PDFs and to overlay images on each page.

PDDocument doc = PDDocument.load(url);
PDFImageWriter imageWriter = new PDFImageWriter();
imageWriter.writeImage(doc, imageFormat, password, 1,
        doc.getNumberOfPages(), filePrefix, imageType, resolution);

I have tried saving the doc as a PDF and this looks fine. When the images are saved they can contain incorrect text. This is especially true for eastern European documents - eg Hungary, Poland, Czech etc

The PDF shows

H-4432 NYÍREGYHÁZA-NYÍRSZŐLŐS

The image shows enter image description here

Is there a solution for this? Do I need to define a codepage? Could it be a problem with the available fonts?

paul
  • 13,312
  • 23
  • 81
  • 144
  • 2
    See this: http://stackoverflow.com/questions/22260344/pdfbox-encode-symbol-currency-euro – Adam Michalik Sep 01 '15 at 09:59
  • 2
    PDFBox capabilities in respect to rendering PDFs to images is quite limited in the 1.x versions. It has much improved in the 2.0.0-SNAPSHOT development versions, cf. [this answer](http://stackoverflow.com/a/24238070/1729265), [this answer](http://stackoverflow.com/a/22358240/1729265), and [this one](http://stackoverflow.com/a/21547909/1729265). Unfortunately the PDFBox 2.0.0-SNAPSHOT API is a moving target, massively refactored every other month, so the code in those answers may not work out of the box anymore. – mkl Sep 01 '15 at 11:25

1 Answers1

0

The solution for me was to switch over to a 2.0 SNAPSHOT (Aug15). All the documents I've tested look fine. The API has changed but, in my case, it took 5 minutes to make the changes.

Thanks to @mkl for the info.

paul
  • 13,312
  • 23
  • 81
  • 144