0

I want to replace a special string in the content of a PDF file.

e.g. I want to replace "111" with "abc".

I 'm using iTextSharp and C#.

Joris Schellekens
  • 8,483
  • 2
  • 23
  • 54
  • For an approach you may want to read [this answer](https://stackoverflow.com/a/57398454/1729265) and port the code from Java to C#. But be aware that the restrictions mentioned in @Joris' answer do apply. – mkl Aug 23 '19 at 08:13

1 Answers1

1

This is (in general) not possible.

PDF is a rendered document. Not a structured document (at least not by default).

Structured document:

  • the document has information on the content
  • e.g. 'this is a paragraph'
  • e.g. 'this is a title just below that paragraph'
  • etc

Rendered document:

  • character 'H' is drawn at location 50, 50 in black
  • character 'e' is drawn at location 55, 50 in black
  • etc

The problem with removing (or adding) content is reflowing the content.

Imagine the following text:

Once upon a midnight dreary,
while I pondered weak and weary,
over many a quaint and curious volume
of forgotten lore.

If I remove the word 'midnight' (in a rendered document), I would get

Once upon a ________ dreary,
while I pondered weak and weary,
over many a quaint and curious volume
of forgotten lore.

In other words, because the document doesn't contain information about what belongs together (is this a word? paragraph? line?), it can't magically put text back together if you remove something.

You'll encounter a similar problem when you're trying to add text.

I know there is an example on the iText website that replaces a string with another in a PDF document. The key difference there is that the target and replacement string have roughly the same (rendered) length. So reflow is not needed.

Joris Schellekens
  • 8,483
  • 2
  • 23
  • 54