1

I'm using com.itextpdf library in java to generate and edit the PDF. I'm facing a wired issue: where the PDF contents (date) is not rendered/displayed properly inside the PDF.

I'm initially created a PDF file uisng itext only and later in the post processing - replacing the PDF contents (date).

For example: Date 28 Nov 2020 is rendered like below (Slight rendering changes on each run - on common or space or number level): enter image description here

Things I tried:

  1. Upgraded the itext from older version:5.5.6 and latest: 5.5.13.2.
  2. Tried Multiple fonts.
  3. Encoding styles: both: UTF-8 and ISO-8859-1, still no luck.

Any pointer would be helpful.

   //initial placeholder:    
   String TEMPORARY_DATE_PLACE_HOLDER = "----------------";
   //BaseFont (tried with both embedded as true / false):
  BaseFont.createFont("/arial.ttf", BaseFont.WINANSI, false);
    -
    -
    -
  // post processing: where the placeholder is replaced.
    reader = new PdfReader(InputPDF);
    PdfDictionary dict   = reader.getPageN(1);
    PdfObject     object = dict.getDirectObject(PdfName.CONTENTS);
        if (object instanceof PRStream) {
           PRStream stream     = (PRStream) object;
            byte[]   data       = PdfReader.getStreamBytes(stream);
            String CHARACTER_ENCODING_SET = "ISO-8859-1";
            String   dataString = new String(data, CHARACTER_ENCODING_SET);
            
            if ( dateFormatList.contains(requiredDate)) {
                dataString = dataString.replaceAll(TEMPORARY_DATE_PLACE_HOLDER, new SimpleDateFormat(dateFormat).format(requiredDate));
            }   
        stream.setData(dataString.getBytes(CHARACTER_ENCODING_SET));
    }
    
    stamper = new PdfStamper(reader, out);
    stamper.close();
    reader.close();
    byte[] fileContent = out.toByteArray();
    helperToWrite(new ByteArrayInputStream(fileContent), "OutputPDF");
    
    //Helper method to write into File:
    private File helperToWrite(nputStream inputStream, String name){
    try (OutputStream outputStream = new FileOutputStream(file)) {
                int    read  = 0;
                byte[] bytes = new byte[1024];
    
                while ((read = inputStream.read(bytes)) != -1) {
                    outputStream.write(bytes, 0, read);
                }
            } catch (Exception e) {
            }
            return file;
}
Mandar Pande
  • 12,250
  • 16
  • 45
  • 72
  • 2
    You say you're *replacing the date*. If you do that the naive way, i.e. by search-and-replace in a content stream, such problems are to be expected, see for example [here](https://stackoverflow.com/a/34315962/1729265). – mkl Dec 08 '20 at 10:32

1 Answers1

0

You neither show your code nor any example PDFs, so one cannot seriously analyze the situation. But as you nonetheless offered a bounty, you appear to be also interested in (educated) guesses. Thus, here my go:

The issue

You say you're replacing the date. Considering your issue I assume you do that the naïve way, i.e. by search-and-replace in a content stream.

If this is the case, you can find the most plausible explanation why that happens in this old answer: The font your text-to-replace is drawn with is subset-embedded, i.e. only the glyphs from the font which actually are used in the original document are embedded in the document. Your replacement text on the other hand contains characters not covered by this subset. Thus, those characters are missing.

The solution in general

The answer referred to above also explains what to do instead in general: first determine the coordinates of the text to replace using text extraction with coordinates, then remove that text, e.g. by redaction, and finally add your replacement using an own font object.

Additionally this answer summarizes the problems one has to tackle for generic search-and-replace of text in PDFs in more detail.

So far, therefore, your question is a duplicate of those questions.

Another solution for your special case

There is one aspect that differs, though, you say you're "using com.itextpdf library in java to generate and edit the PDF", i.e. you don't have to edit arbitrary PDF but only PDFs you generate yourself with iText. Thus, you can also create the original PDF differently to make it better suited for your later editing!

To generate your template PDF in a way better suited for text search and replace, either don't subset-embed fonts at all or make sure that the embedded subset is large enough for your planned replacement texts.

Fonts are embedded if in your BaseFont.createFont call you set the Boolean parameter embedded to true (== BaseFont.EMBEDDED) or if you set the String parameter encoding to "Identity-H" (== BaseFont.IDENTITY_H) or "Identity-V" (== BaseFont.IDENTITY_V). For the former approach you have to avoid this.

Fonts iText embeds as subsets contain all the glyphs required by the text showing instructions in the PDF content streams. To make sure some glyph is in the subset, simply draw it somewhere. For the latter approach, therefore, collect all characters you may need in your replacement texts in a string and draw it. You can draw it invisibly (white on white, rendering mode invisible, covered by something, outside the clip path or crop box, ...) or you can remove it during your search-and-replace pass.

Even in case of non-embedded fonts iText in some cases creates subsets. Thus, you should explicitly disable subsetting even for non-embedded fonts.

Applied to your code

You meanwhile shared your pivotal code. And indeed, I could reproduce the issue with it. The problem is that your BaseFont

BaseFont.createFont("/arial.ttf", BaseFont.WINANSI, false)

is subsetted even though it is not embedded. Thus, use the following instead:

BaseFont baseFont = BaseFont.createFont("/arial.ttf", BaseFont.WINANSI, false);
baseFont.setSubset(false);
mkl
  • 90,588
  • 15
  • 125
  • 265
  • Thanks @mkl for your response, I have added the code snippet. – Mandar Pande Jan 18 '21 at 08:52
  • @MandarPande I edited my answer. In case of your code the font is subsetted even though it is not embedded. As described in my edit you can simply deactivate subsetting by setting the `Subset` property accordingly. – mkl Jan 18 '21 at 14:45
  • Thanks for the suggestion, I'll check today. – Mandar Pande Jan 19 '21 at 08:41