0

I need to optimize a number of big PDF documents for file size, so I tried using ghostscript, invoked like this:

gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dBATCH -sOutputFile=output-my-doc.pdf input-my-doc.pdf

I can see this running for some pages, but then on particular pages it crashes.

I updated to gs version 9.02, and I experience the same. After bursting the document into separate pages, and running the command above on each page, I could confirm which pages are problematic ones; in fact, the error occurs even if I call just gs input-my-doc-pageX.pdf - this starts a viewer, and I could see text typeset until it came to an image, when it crashed.

So I could confirm that in my case, gs crashes on specific images - and finally I can also provide a minimal working (or rather, non-working) example, which demonstrates the problem (below). In particular, the problem seems to be 8-bit RGB images, specified in a certain way.

 

Now, I cannot tell if this is a bug, but since I need to get this done - I was thinking that maybe I could "cheat" ghostscript, by running the PDFs through an application, which would pretty much leave the PDFs untouched - except that it would re-encode the images to a single format (say, PNG); so that the gs optimizer could run over these files too without crashing.

What options do I have to re-encode only the images of a given PDF using the command line in Linux?

Many thanks in advance for any answers,
Cheers!

 

PS: The test case is basically the source-code PDF example in the post: Imagemagick: generate raw image data for PDF flate embedding?.

That PDF (hello2.pdf) opens just fine in, say, evince:

hello2.pdf-evince-OK

... but since it's xref-table is corrupt, I repair it:

$ pdftk hello2.pdf output hello2O.pdf
$ qpdf --check hello2O.pdf 
checking hello2O.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
No errors found

The repaired file hello2O.pdf also opens fine in evince - however, when I try to run the above gs optimizing command on it, it fails:

$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dBATCH -sOutputFile=optihello2O.pdf hello2O.pdf
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Loading NimbusSanL-Regu font from /usr/share/ghostscript/9.02/Resource/Font/NimbusSanL-Regu... 2756020 1410650 1869284 568021 3 done.
Error: /undefined in --run--
Operand stack:
   --dict:6/15(L)--   false   --dict:11/19(L)--   --dict:4/4(L)--   --nostringval--   FlateDecode   --dict:4/4(L)--   0
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2   %stopped_push   --nostringval--   --nostringval--   --nostringval--   false   1   %stopped_push   1910   1   3   %oparray_pop   1909   1   3   %oparray_pop   1893   1   3   %oparray_pop   --nostringval--   --nostringval--   2   1   1   --nostringval--   %for_pos_int_continue   --nostringval--   --nostringval--   --nostringval--   --nostringval--   %array_continue   --nostringval--   false   1   %stopped_push   --nostringval--   %loop_continue   --nostringval--   576   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--   --nostringval--   %array_continue   --nostringval--   --nostringval--
Dictionary stack:
   --dict:1160/1684(ro)(G)--   --dict:1/20(G)--   --dict:82/200(L)--   --dict:82/200(L)--   --dict:108/127(ro)(G)--   --dict:295/300(ro)(G)--   --dict:23/30(L)--   --dict:6/8(L)--   --dict:25/40(L)--   --dict:7/17(L)--
Current allocation mode is local
GPL Ghostscript 9.02: Unrecoverable error, exit code 1
Community
  • 1
  • 1
sdaau
  • 36,975
  • 46
  • 198
  • 278

1 Answers1

2

First, if you find a Ghostscript bug, please report it to us as http://bugs.ghostscript.com

Secondly I suggest you update the current shipping version of 9.05 which probably has this bug fixed.

KenS
  • 30,202
  • 3
  • 34
  • 51