9

I have a PDF which is searchable and I need to convert it into a non-searchable one.

I tried using Ghostscript and change it to JPEG and then back to PDF which does the trick but the file size is way too large and not acceptable.

I tried using Ghostscript to convert the PDF to PS first and then PDF which does the trick as well but the quality is not good enough.

gswin32.exe -q -dNOPAUSE -dBATCH -dSAFER -sDEVICE=pswrite -r1000 -sOutputFile=out.ps in.pdf
gswin32.exe -q -dNOPAUSE -dBATCH -dSAFER -dDEVICEWIDTHPOINTS=596 -dDEVICEHEIGHTPOINTS=834 -dPDFSETTINGS=/ebook -sDEVICE=pdfwrite -sOutputFile=out.pdf out.ps

Is there a way to give a good quality to the PDF?

Alternatively is there an easier way to convert a searchable PDF to a non-searchable one?

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Steven Yong
  • 5,163
  • 6
  • 35
  • 56

3 Answers3

20

You can use Ghostscript to achieve that. You need 2 steps:

  1. Convert the PDF to a PostScript file, which has all used fonts converted to outline shapes. The key here is the -dNOCACHE paramenter:

    gs -o somepdf.ps -dNOCACHE -sDEVICE=pswrite somepdf.pdf
  2. Convert the PS back to PDF (and, maybe delete the intermediate PS again):

    gs -o somepdf-with-outlines.pdf -sDEVICE=pdfwrite somepdf.ps
    rm somepdf.ps

Note, that the resulting PDF will very likely be larger than the original one. (And, without additional command line parameters, all images in the original PDF will likely also be converted according to Ghostscript builtin defaults, unless you add more command line parameters to do otherwise. But the quality should be better than your own attempt to use Ghostscript...)


Update

Apparently, from version 9.15 (to be released during September/October 2014), Ghostscript will support a new command line parameter:

 -dNoOutputFonts

which will cause the output devices pdfwrite, ps2write and eps2write "to 'flatten' glyphs into 'basic' marking operations (rather than writing fonts to the output)".

This means that the above two steps can be avoided, and the desired result be achieved with a single command:

 gs -o somepdf-with-outlines.pdf -dNoOutputFonts -sDEVICE=pdfwrite somepdf.pdf

Caveats: I've tested this with a few input files using a self-compiled Ghostscript based on current Git sources. It worked flawlessly in each case.

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • in my humble opinion I recommend to convert first in postscript: `gs -dBATCH -dNOPAUSE -dNOCACHE -dSAFER -sDEVICE=pswrite -sOutputFile=output.ps input.pdf` and then to pass this resulting postscript to *ps2pdf*. If your page has a custom pagesize (e.g. 17x24 cm) you need to pass *-g* switch to *ps2pdf*, like, for instance, for a 17x24 cm pagesize: `ps2pdf -g4820x6800 input.ps output.pdf`; in other words, take a look to pagesize in points (with pdfinfo), and then multiply both dimensions by 10 – Dingo Apr 10 '12 at 12:56
  • 2
    @Dingo: If you use a recent GS version, `-o out.ps` is the same as `-dBATCH -dNOPAUSE -sOutputFile=out.ps`. Also, a recent version of GS does set the output file's (PostScript) page size automatically the same as the input file's (PDF) was. If you want to be on the safe side, you may additionally set it with `-gNNNxMMM` without a problem. I don't like `ps2pdf` for most cases, because it is a wrapper around a Ghostscript commandline anyway.... -- So, what's left as a difference between your newest recommendation and mine? – Kurt Pfeifle Apr 10 '12 at 13:49
  • you are right. I have not read carefully before. Excuse me and thanks for great tips! – Dingo Apr 10 '12 at 16:01
  • It worked well with `gs -o somepdf-with-outlines.pdf -dNoOutputFonts -sDEVICE=pdfwrite somepdf.pdf` on my gentoo system too. 9.10 is too old, but 9.15 converts well to curves. – Jonas Stein Feb 26 '15 at 10:48
  • Maybe some highlights on the one-step solution :) – wuxb Mar 08 '18 at 16:49
4

a possible way to produce non-searchable vector pdf from a searchable vector pdf is

  1. burst pdf in its single pages

    pdftk file.pdf burst

  2. convert any single page in svg with

    pdftocairo

contained into poppler utils

for f in *.pdf; do pdftocairo -svg $f; done

3 . delete ALL pdf in folder

4 . then, with batikrasterizer

re-convert ALL svg to pdf (this time the resulting pdfs will be kept vectorial, but without to be searchable)

java -jar ./batik-rasterizer.jar -m application/pdf *.svg

final step: join all resulting single page pd in one multipage pdf file

pdftk *.pdf cat output out.pdf
Dingo
  • 2,619
  • 1
  • 22
  • 32
  • an [alternative tool](http://manpages.ubuntu.com/manpages/xenial/man1/rasterizer.1.html) for step 4 can be used with the following command: `$ for f in *.svg; do rasterizer -m application/pdf $f; done`. PS: I am not quite sure how these two tools are related or overlap, though... – nutty about natty Oct 12 '16 at 08:40
  • expanding the scope of the original question, this would be a way to crop the resulting file: http://tex.stackexchange.com/a/42259/27721 – nutty about natty Oct 12 '16 at 08:46
0

I think converting to an image like jpg is the way to go, it might be worth converting to am image, optimizing/reducing the size of the images and then creating a PDF with those?

Mark Redman
  • 24,079
  • 20
  • 92
  • 147