I need to optimize a number of big PDF documents for file size, so I tried using ghostscript
, invoked like this:
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dBATCH -sOutputFile=output-my-doc.pdf input-my-doc.pdf
I can see this running for some pages, but then on particular pages it crashes.
I updated to gs
version 9.02, and I experience the same. After bursting the document into separate pages, and running the command above on each page, I could confirm which pages are problematic ones; in fact, the error occurs even if I call just gs input-my-doc-pageX.pdf
- this starts a viewer, and I could see text typeset until it came to an image, when it crashed.
So I could confirm that in my case, gs
crashes on specific images - and finally I can also provide a minimal working (or rather, non-working) example, which demonstrates the problem (below). In particular, the problem seems to be 8-bit RGB images, specified in a certain way.
Now, I cannot tell if this is a bug, but since I need to get this done - I was thinking that maybe I could "cheat" ghostscript
, by running the PDFs through an application, which would pretty much leave the PDFs untouched - except that it would re-encode the images to a single format (say, PNG); so that the gs
optimizer could run over these files too without crashing.
What options do I have to re-encode only the images of a given PDF using the command line in Linux?
Many thanks in advance for any answers,
Cheers!
PS: The test case is basically the source-code PDF example in the post: Imagemagick: generate raw image data for PDF flate embedding?.
That PDF (hello2.pdf
) opens just fine in, say, evince
:
... but since it's xref-table is corrupt, I repair it:
$ pdftk hello2.pdf output hello2O.pdf
$ qpdf --check hello2O.pdf
checking hello2O.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
No errors found
The repaired file hello2O.pdf
also opens fine in evince
- however, when I try to run the above gs
optimizing command on it, it fails:
$ gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dPDFSETTINGS=/screen -dNOPAUSE -dBATCH -sOutputFile=optihello2O.pdf hello2O.pdf
GPL Ghostscript 9.02 (2011-03-30)
Copyright (C) 2010 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 1.
Page 1
Loading NimbusSanL-Regu font from /usr/share/ghostscript/9.02/Resource/Font/NimbusSanL-Regu... 2756020 1410650 1869284 568021 3 done.
Error: /undefined in --run--
Operand stack:
--dict:6/15(L)-- false --dict:11/19(L)-- --dict:4/4(L)-- --nostringval-- FlateDecode --dict:4/4(L)-- 0
Execution stack:
%interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push 1910 1 3 %oparray_pop 1909 1 3 %oparray_pop 1893 1 3 %oparray_pop --nostringval-- --nostringval-- 2 1 1 --nostringval-- %for_pos_int_continue --nostringval-- --nostringval-- --nostringval-- --nostringval-- %array_continue --nostringval-- false 1 %stopped_push --nostringval-- %loop_continue --nostringval-- 576 --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- --nostringval-- %array_continue --nostringval-- --nostringval--
Dictionary stack:
--dict:1160/1684(ro)(G)-- --dict:1/20(G)-- --dict:82/200(L)-- --dict:82/200(L)-- --dict:108/127(ro)(G)-- --dict:295/300(ro)(G)-- --dict:23/30(L)-- --dict:6/8(L)-- --dict:25/40(L)-- --dict:7/17(L)--
Current allocation mode is local
GPL Ghostscript 9.02: Unrecoverable error, exit code 1