0

I'm trying to figure out how to find text that I've previously added to a PDF with iText7.

I'm playing around with iText7, and have the following code:

static void Main(string[] args)
{
  PdfDocument pdfDocument = new PdfDocument(new PdfWriter("./test.pdf"));
  pdfDocument.AddNewPage(PageSize.LETTER.Rotate());

  Document document = new Document(pdfDocument);

  PdfFont helv = PdfFontFactory.CreateFont(StandardFonts.HELVETICA);

  Paragraph paragraph = new Paragraph("test string");
  paragraph.SetFont(helv);
  paragraph.SetFontSize(8);
  paragraph.SetFixedPosition(500, 194, 100);
  document.Add(paragraph);
  document.Close();

  return;
}

I then run different code to get me the streams, which shows me the following:

q
BT
/F1 8 Tf
500 197.54 Td
(test string)Tj
ET
Q

Of note is that where I specified a Y position of 194, the resulting PDF shows 197.54. If I add (user-supplied) text to the PDF, and then want to go back later and replace that text with something else, I can know that, for at least that specific font/size, I have to add 3.54 to the Y I originally specified; I'm assuming that has something to do with the font's baseline v. iText specifying the bottom of the text block.

My question is, how can I calculate what that "3.54" is for any other Font or size I might use. Is there info I can get from iText to help, or is it just "multiply the font size by 0.44 for Helvetica, and 0.35 for Courier, etc"?

So far, and maybe it's just been lucky, I haven't seen any issues with a string of text being split up into different Td/Tj commands, so I'm going to ignore that potential future problem for the moment.

Thanks!

Phil M
  • 1,619
  • 1
  • 8
  • 10

1 Answers1

1

If you make iText determine the layout of text, even using SetFixedPosition, there is a multitude of values in that calculation, in particular the font descent, the leading, paddings and margins, ...

These complication foremost are there to be able to emulate HTML/CSS-like text type setting features.

Furthermore, iText applies rounding to the numbers it writes to the content stream.

If you instead want to easily be able to recognize the position you gave, don't make iText determine the layout of text but do it yourself and use coordinates which won't be harmed by rounding, e.g. instead of

Paragraph paragraph = new Paragraph("test string");
paragraph.SetFont(helv);
paragraph.SetFontSize(8);
paragraph.SetFixedPosition(500, 194, 100);
document.Add(paragraph);

do

PdfCanvas canvas = new PdfCanvas(pdfDocument, 1);
canvas.BeginText()
      .SetFontAndSize(helv, 8)
      .MoveText(100, 194)
      .ShowText("test string")
      .EndText();

which results in

BT
/F1 8 Tf
100 194 Td
(test string) Tj
ET

allowing you to immediately recognize your coordinates.

(Of course this means that the y value is not that of the very bottom of the text but of its base line.)


That being said, you mention that you want to go back later and replace that text with something else. Please allow me to recommend against doing so. Text in PDF content streams is not meant for such editing. You can find many questions here on stack overflow by people who tried that and ran into trouble after a start that appeared easy. Read this answer enumerating some of the hindrances.

Even if you only want to edit documents you create yourself and, therefore, control the hindrances therein to a degree, you are not safe from such problems, e.g. after library updates.

An alternative would be the use of AcroForm form fields (which you can make read-only to prevent accidental manipulations and even flatten as soon as no changes are expected anymore).

mkl
  • 90,588
  • 15
  • 125
  • 265
  • Luckily, this is meant to be a short-term thing. And the PDFs I will edit are only meant to last days at most before being printed, so I won't need to change something on a PDF older than that, if at all. – Phil M Mar 27 '20 at 16:21