How to find and replace text in a existing PDF file with PDFTK (or other command line application)

Question

I have on each page of my PDF document a line with this string:

%REPLACE%

Which I'd like to find and replace with another string.

Does anyone know how to do this with some command line application such as PDFTK?

This folk gave me an important clue however I'd like something more direct.

Thanks.

Does this answer your question? [How to program a text search and replace in PDF files](https://stackoverflow.com/questions/220445/how-to-program-a-text-search-and-replace-in-pdf-files) — rogerdpack, Jun 11 '21 at 06:09
I added an answer to the above question of a custom program I wrote for this purpose https://stackoverflow.com/a/67932076/32453 — rogerdpack, Jun 16 '21 at 04:15

score 47 · Answer 1 · edited Apr 11 '13 at 05:43

47

You can try to modify content of your PDF as follows

Uncompress the text streams of PDF

pdftk file.pdf output uncompressed.pdf uncompress

Use sed to replace your text with another

sed -e "s/ORIGINALSTRING/NEWSTRING/g" <uncompressed.pdf >modified.pdf

If this attempt was successful, re-compress the PDF with pdftk
```
pdftk modified.pdf output recompressed.pdf compress
```

Note: This way is not successful every time, mainly due to font subsetting

edited Apr 11 '13 at 05:43

thirdender

3,891
2
30
33

answered Mar 26 '12 at 12:54

Dingo

2,619
1
22
32

I can't make this work with the PDF file exported from Google Docs (even when I choose arial as the only font). I am afraid that I'd have to use some other application only to write the page and then try the very simple and wonderful code you wrote... – Roger Mar 26 '12 at 14:52
2

with *pdfedit* you can have more chances (if fonts are fully embedded) to edit text content - http://pdfedit.cz/en/index.html – Dingo Mar 26 '12 at 15:01
2

pdfedit can be used also from command line without gui (see its site for command line utilities) – Dingo Mar 27 '12 at 12:35
7

Note that this will only work when the text is using `Tj` command in PDF along with plain ASCII chars. As soon as octal, hex or glyph refences are used, you are lost. – Michael-O Dec 14 '18 at 11:00
For anyone with Mac M1 this might be useful - https://stackoverflow.com/questions/60859527/how-to-solve-pdftk-bad-cpu-type-in-executable-on-mac – PeteW Sep 09 '21 at 15:51
I had to replace `sed`, because of encoding issues, with `perl -pi.bak -e 's/findthis/replacewiththis/g' uncompressed.pdf` from https://stackoverflow.com/a/6995010/241542 – pgericson Jan 03 '22 at 09:27
Is this able to use regex for `sed`? Without regex, it works. But with regex it says ``` Error: Unable to find file. Error: Failed to open PDF file: modified.pdf Errors encountered. No output created. Done. Input errors, so no output created. ``` – Nor.Z May 20 '22 at 20:47
I suspect pdfbox has something available what would help with the font subsetting. I have an example to start working with: Forked from https://gist.github.com/DavidYKay/82f20ba67c50c499ebb3 from * https://jackson-brain.com/using-pdfbox-to-locate-text-coordinates-within-a-pdf-in-java/ – jcalfee314 Feb 23 '23 at 15:11

score 1 · Answer 2 · answered Jun 10 '21 at 14:55

1

For making a small change just on a few pages, inkscape can do a good job. It can also fix some issues in diagrams and with table borders. One must process each page separately, though, and stick the pages back together using pdfunite. (Unchanged page ranges can be extracted with pdfseparate.)

Inspiration: https://tatica.org/2015/07/13/edit-pdf-inkscape/

answered Jun 10 '21 at 14:55

Joachim Wagner

860
7
16

1

For simple changes, this works with Inkscape. Inkscape 1.2 (released on 2022-02-05) supports multi-page PDF documents for both import and export, so it is no longer needed to use `pdfunite`. To be able to edit the text, one first needs to do an Ungroup on the object that consists of a full PDF page. – vinc17 Sep 18 '22 at 02:04
"Inkscape encountered an internal error and will close now" – Dan Dascalescu Nov 17 '22 at 22:24

score -1 · Answer 3 · answered Jun 05 '21 at 01:26

-1

changepagestring will do this in a single step, as easy as:

changepagestring -o -v infile.pdf search-regex replace-str outfile.pdf

However like the currently accepted answer, this is hit or miss and doesn't work as expected with all files.

answered Jun 05 '21 at 01:26

Brian Z

875
1
13
20

1

Yes sadly this didn't work with my file, it could find 2 letters but not the whole word I wanted to find – Matthew Lock Sep 20 '21 at 07:10
1

I've been finding that when this fails, it's often just a matter of finding the right regex. I haven't figured out if there's a way to see the text exactly as needed to understand how it works, but a regex like 'word1*word2' may work where 'word1 word2' fails. – Brian Z Jul 21 '22 at 13:02
1

In Debian/unstable, `changepagestring` does not work at all (I've tried on a single word, so this is simpler than a regexp), even on a simple PDF file obtained with `pdflatex`, for which `pdftotext` can find the word. [Debian bug 1019979](https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1019979). – vinc17 Sep 18 '22 at 02:09
Didn't work for my PDF: couldn't find even a 2-letter word, and when using one letter, the output lost most of the formatting, which should not be touched by a search & replace. – Dan Dascalescu Nov 17 '22 at 22:22

How to find and replace text in a existing PDF file with PDFTK (or other command line application)

3 Answers3

Linked