0

Using BufferedImage bim = pdfRenderer.renderImage(page) an image is created leaving non-selectable text blank. How can this be included?

[This pdf only allows the user to select "This is to certify that"] enter image description here

The bold "text" does not appear in the image created. Also cannot be selected/highlighted. The bold "text" has disappeared

enter image description here

What am I overlooking here? Or is this a type of PDF security feature the renderImage is unable to deal with?

The log displays these warnings:

19:31:01.595 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+44 (44) in font CJAOPE+Arial
19:31:01.595 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+81 (81) in font CJAOPE+Arial
19:31:01.596 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+74 (74) in font CJAOPE+Arial
19:31:01.596 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+72 (72) in font CJAOPE+Arial
19:31:01.596 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+89 (89) in font CJAOPE+Arial
19:31:01.597 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+82 (82) in font CJAOPE+Arial
19:31:01.597 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+79 (79) in font CJAOPE+Arial
19:31:01.598 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+36 (36) in font CJAOPE+Arial
19:31:01.598 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+85 (85) in font CJAOPE+Arial
19:31:01.598 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+87 (87) in font CJAOPE+Arial
19:31:01.599 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+76 (76) in font CJAOPE+Arial
19:31:01.599 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+78 (78) in font CJAOPE+Arial
19:31:01.599 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+25 (25) in font CJAOPE+Arial
19:31:01.600 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+11 (11) in font CJAOPE+Arial
19:31:01.600 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+20 (20) in font CJAOPE+Arial
19:31:01.600 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+12 (12) in font CJAOPE+Arial
19:31:01.601 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+68 (68) in font CJAOPE+Arial
19:31:01.602 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+71 (71) in font CJAOPE+Arial
19:31:01.602 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+58 (58) in font CJAOPE+Arial
19:31:01.603 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+83 (83) in font CJAOPE+Arial
19:31:01.604 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+37 (37) in font CJAOPE+Arial
19:31:01.604 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+75 (75) in font CJAOPE+Arial
19:31:01.606 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+55 (55) in font CJAOPE+Arial
19:31:01.606 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+88 (88) in font CJAOPE+Arial
19:31:01.607 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+86 (86) in font CJAOPE+Arial
19:31:01.608 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+15 (15) in font CJAOPE+Arial
19:31:01.609 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+28 (28) in font CJAOPE+Arial
19:31:01.609 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+27 (27) in font CJAOPE+Arial
19:31:01.610 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+24 (24) in font CJAOPE+Arial
19:31:01.610 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+26 (26) in font CJAOPE+Arial
19:31:01.612 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+80 (80) in font CJAOPE+Arial
19:31:01.612 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+73 (73) in font CJAOPE+Arial
19:31:01.613 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+54 (54) in font CJAOPE+Arial
19:31:01.613 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+70 (70) in font CJAOPE+Arial
19:31:01.615 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+51 (51) in font CJAOPE+Arial
19:31:01.615 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+92 (92) in font CJAOPE+Arial
19:31:01.616 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+38 (38) in font CJAOPE+Arial
19:31:01.618 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+49 (49) in font CJAOPE+Arial
19:31:01.619 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+29 (29) in font CJAOPE+Arial
19:31:01.702 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+43 (43) in font CJAOPE+Arial
19:31:01.703 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+90 (90) in font CJAOPE+Arial
19:31:01.705 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+18 (18) in font CJAOPE+Arial
19:31:01.713 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+69 (69) in font CJAOPE+Arial
19:31:01.713 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+93 (93) in font CJAOPE+Arial
19:31:01.719 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+42 (42) in font CJAOPE+Arial
19:31:01.720 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+40 (40) in font CJAOPE+Arial
19:31:01.721 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+57 (57) in font CJAOPE+Arial
19:31:01.727 [main] DEBUG org.apache.fontbox.ttf.PostScriptTable - No PostScript name information is provided for the font CJBBIL+Calibri
19:31:01.728 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+17 (17) in font CJAOPE+Arial
19:31:01.746 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+39 (39) in font CJAOPE+Arial
19:31:01.747 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+53 (53) in font CJAOPE+Arial
19:31:01.747 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+48 (48) in font CJAOPE+Arial
19:31:01.747 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+50 (50) in font CJAOPE+Arial
19:31:01.747 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+41 (41) in font CJAOPE+Arial
19:31:01.747 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+45 (45) in font CJAOPE+Arial
19:31:01.747 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+56 (56) in font CJAOPE+Arial
19:31:01.748 [main] WARN org.apache.pdfbox.pdmodel.font.PDType0Font - No Unicode mapping for CID+47 (47) in font CJAOPE+Arial

Further investigation points me to earlier postings: How to solve no unicode mapping error from PDFBox? and how to add unicode in truetype0font on pdfbox 2.0.0?

While the text does not show in the pdf to image, using the pdf debugger tool, I do see the following: the relevant text annotated?, which is in fact the relevant text.

enter image description here

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97
  • Using JPetal jar or qoppa jar's the bold text is included in the image, and easily OCRed into text. Why would the pdfbox BufferedImage exclude the bold text? – Pontifx101 Feb 22 '23 at 04:02
  • 1
    Please tell what version you are using and link to the file. Also tell any log messages you get. Run PDFDebugger if you are not sure. – Tilman Hausherr Feb 22 '23 at 04:11
  • 2
    I would **not** recommend using the open source version of jpedal. I used to maintain one of the last [forks](https://github.com/Lonzak/JPedal) but we migrated to pdfbox since it is actively maintained and also works with newer Java Versions etc. Also lots of bugs are fixed (which jpedal has). So provide your PDF as Tilmann suggested and we can take a look... – Lonzak Feb 22 '23 at 08:03
  • @TilmanHausherr thank you, the log indeed holds the clue and I see you have applied your mind to earlier postings of this nature. The version I use is pdfbox 2.0.27 Not quite sure yet where to go from here. – Pontifx101 Feb 22 '23 at 17:57
  • Is the text in an OCG that is configured to not appear in printing? – mkl Feb 23 '23 at 10:12
  • So it's in field annotation widget, it would be interesting to see what kind of field this is (you show only the lower part) and whether there is an appearance stream. Btw the warnings are harmless, they apply to text extraction. I'd still like to get the file. – Tilman Hausherr Feb 23 '23 at 12:55

0 Answers0