6

When researching how to compress a bunch of PDFs with pictures inside (ideally in a lossless fashion, but I'll settle for lossy) I found that a lot of people recommend doing this:

$ pdf2ps file.pdf
$ ps2pdf file.ps

This works! The resulting file is smaller and looks at least good enough.

  • How / why does this work?
  • Which settings can I tweak in this process?
  • If there is some lossy conversion, which one is that?
  • Where is the catch?
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
vektor
  • 3,312
  • 8
  • 41
  • 71
  • If you pdf2ps the resulting PDF file does anything change if you diff it against file.ps? – dsolimano Apr 28 '15 at 14:24
  • 1
    Yes, there is a ton of changes. Just for the illustration: original is 2.9 MB, the first PS is 3.2 MB, compressed PDF is 0.5 MB and the last PS is 3.6 MB... – vektor Apr 28 '15 at 14:29
  • Can you post sample files? I suspect that you are losing font information in this process. Can you copy/paste text from the final PDF? Does it work in the original? Do the files look the same when displayed? Did you try zooming in/out? – yms Apr 28 '15 at 14:31
  • Sorry, I should not publish those PDFs. Inspection reveals that they are basically JPEGs images (I can see artifacts) stored in the PDF, and the final result simply has more artifacts - possibly lower JPEG quality settings. However, I am looking for a general insight into how those 2 commands work. – vektor Apr 28 '15 at 14:34

1 Answers1

8

People who recommend this procedure rarely do so from a background of expertise or knowledge -- it's rather based on gut feelings.

The detour of generating a new PDF via PostScript and back (also called "refrying a PDF") is never going to give you the optimal results. Sometimes it is useful, f.e. in cases were the original PDF isn't printed at all, or cannot be processed by another application. But these cases are very rare.

In any case, this "roundtrip" conversion will never lead to the same PDF file as initially.

Also the pdf2ps and ps2pdf tools aren't an independent tools at all: they are just simple wrapper scripts around a Ghostscript (gs or gswin32c.exe) command line. You can check that yourself by doing:

cat $(which ps2pdf)
cat $(which pdf2ps)

This will also reveal the (default) parameters these simple wrappers use for the respective conversions.

If you are unlucky, you will have an ancient Ghostscript installed. The PostScript which is then generated by pdf2ps will be Level 1 PS, and this will be "lossy" for many fonts which could be used by more modern PDF files, resulting in rasterization of previous vector fonts. Not exactly the output you'd like to look at...

Since both tools are using Ghostscript anyway (but behind your back), you are better off to run Ghostscript yourself. This gives you more control over the parameters it uses. Especially advantageous is the fact that this way you can get a direct PDF->PDF conversion, without any detour via an intermediary PostScript file format.

Here are a few answers which would give you some hints about what parameters you could use in order to drive the file size down in a semi-controlled way in your output PDF:

Community
  • 1
  • 1
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345