I have a large PDF (~20mb, 160 mb. uncompressed). I need to do a find and replace in the text in it, about 1000 times. Here is what I tried.
Via SVG
- Tranform to SVG (inkscape)
- Read SVG line by line and do the replace in the file
- Transform back to PDF
=> bad output, probably due to some geometric transform matrix in the SVG, the text is not well rendered
Creating ~1000 sed command
- Uncompress PDF
- Perform each replace with a sed command
- Recompress PDF
=> way too long. each sed command takes about 20 sec, leading to several hours of process
Read line-by-line and replace
- Uncompress PDF
- Read line by line the PDF
- find text to be replaced
- replace using perl
- write line to a new file
- Compress the new file
=> due to left data-stream in the uncompressed PDF, the new file is apparently damaged (writing binary as lines of text)
I wonder if it would be possible to read line-by-line the uncompressed PDF, but do the editing directly in it. How could I do this?
I have searched for perl inline editing, but it performs the changes in the whole file at once, while I'd like to edit a single line.
Other ideas are more than welcome ;)
Following advise, I used CAM::PDF, this was the most efficient and simple solution