2

I'm trying to write a small program for Linux to resize PDFs and adjust margins. My plan was to use Ghostscript as a back-end. This Terminal command successfully resizes most PDFs:

gs -q -sDEVICE=pdfwrite -dBATCH -dNOPAUSE -dFIXEDMEDIA -dPDFFitPage \
 -dDEVICEWIDTHPOINTS=300 -dDEVICEHEIGHTPOINTS=400 -sOutputFile=out.pdf file.pdf

The -dPDFFitPage option scales pages to fit the new size, adding whitespace as padding if the image aspect ratio doesn't match the specified dimensions. Removing -dPDFFitPage changes the page size without scaling - pages will be cropped if too large, or whitespace added if too small.

However, the command doesn't work with PDFs created by ImageMagick's "convert" program. The PDF is scaled but no whitespace is added so only one dimension will be correct in the output file. Without the -dPDFFitPage option oversize images are cropped as expected, but nothing appears to happen if the image is smaller than the new page size (i.e. no whitespace is added).

It appears that the problem lies with the fact that the PDF is empty apart from the image. How can I get Ghostscript to adjust the page size and fill the empty part of the page with white if necessary?

Edit: Example files

To see the problem, try with these example files (there are also example Ghostscript output PDFs).

Alternatively, use ImageMagick (or any image editor) to create a suitable example image yourself:

convert -size 500x500 xc:skyblue -fill black -draw "circle 250,250 0,250" image.png

Now, use ImageMagick (NOT any other program) to convert it to a PDF:

convert image.png file.pdf

Now try this with the Ghostscript code. See what happens when you try it:

  • with and without -dPDFFitPage
  • with the width and height smaller than the original, and with them larger

To see how it is supposed to work, try using any other tool to convert the example image to a PDF. You could (for example) use LibreOffice or LaTeX, or take the PDF you just made (the one that didn't work) and "Print" it to create another PDF (which for some reason will work). Make sure the image fills the entire page of the PDF (there should be no whitespace/border in the PDF you use to test with Ghostscript, but the output PDF created by Ghostscript should have some whitespace.)

HullCityFan852
  • 212
  • 1
  • 11
  • I can't see any way to get the behaviour you describe. You are going to have to provide an example file to look at before anyone can help. – KenS May 23 '15 at 08:17
  • Thanks for taking a look at this. See my edit. – HullCityFan852 May 23 '15 at 14:05
  • I don't have ImageMagick, and I'm not going to install it to look at your problem. If you want me to take a look, make a file available which exhibits the problem. Put it on dropbox or something. If you feel really keen you could post a working copy too, but the failing one is almost certainly all that's needed. – KenS May 23 '15 at 17:10
  • Ok, I assumed you were using a Linux distro that includes a copy of ImageMagick by default. It's the fact that ImageMagic is so prevalent that warrants the extra fuss. Please see my new edit - thanks! – HullCityFan852 May 23 '15 at 23:50

2 Answers2

1

Your original PDF file (NotWorking.pdf) contains a /CropBox in addition to a /MediaBox. This is carried through to the output PDF file, and due to the way that -dPDFFitPage works, it is appropriately modified in the same way as the actual content of the PDF file. The result is that the scaled file looks the same as the original.

It isn't the same, the original file has a /MediaBox of [0 0 500 500] and the modified file has a /MediaBox of [0 0 300 400]. But the effect is that it looks the same, in a reader which enforces the /CropBox.

Try running the two 'modified' files back through Ghostscript and see what happens. Ghostscript honours the /MediaBox, not the /CropBox, by default.

Once you've tried running the two output files through Ghostscript, try it with -dUseCropBox.

You'll need to...

  • ...either disable the /CropBox,
  • ...or set it to be the same as the /MediaBox,

if need to do a pdfmark operation. You might like to refer to this answer for some more pointers.

Community
  • 1
  • 1
KenS
  • 30,202
  • 3
  • 34
  • 51
  • Thanks. Using this information I got it working by adding the following just before the `file.pdf` in the command given in the question: `-c "<> setpagedevice" -f` – HullCityFan852 May 25 '15 at 01:13
0

Just an additional pointer...

  • In cases where there are already /CropBox definitions in an input PDF file, the method to provide one via a -c "[...pdfmark" parameter for Ghostscript will not work!

In these cases it often helps to first "disarm" the existing /CropBox keyword inside the PDF file by changing it to lower case: make it read /cropBox (since PDF keywords are case sensitive, it will no longer be recognized/used).

You can do this with any method at your disposal: text editor (use one that doesn't change your EOL characters behind your back!), or sed, or...

To check, if there are *Boxes defined which are different from the default /MediaBox (which MUST be there in each PDF file), you can use pdfinfo -box. This command will always report values not only for /MediaBox, but also for /CropBox, /BleedBox, /ArtBox and /TrimBox. In cases were /CropBox, /BleedBox, /ArtBox and /TrimBox are not explicitly defined in the PDF document, the tool will report identical values as are set for /MediaBox:

$ pdfinfo -box "out(NotWorking).pdf"

 Title:          NotWorking
 Producer:       GPL Ghostscript 9.15
 CreationDate:   Sun May 24 00:38:55 2015
 ModDate:        Sun May 24 00:38:55 2015
 Tagged:         no
 UserProperties: no
 Suspects:       no
 Form:           none
 JavaScript:     no
 Pages:          1
 Encrypted:      no
 Page size:      300 x 300 pts
 Page rot:       0
 MediaBox:           0.00     0.00   300.00   400.00
 CropBox:            0.00    50.00   300.00   350.00
 BleedBox:           0.00    50.00   300.00   350.00
 TrimBox:            0.00    50.00   300.00   350.00
 ArtBox:             0.00    50.00   300.00   350.00
 File size:      16316 bytes
 Optimized:      no
 PDF version:    1.5

However, this does not help in cases where the /CropBox definition is explicitly there, but set to identical values as /MediaBox is set:

$ pdfinfo -box NotWorking.pdf

 Title:          NotWorking
 Producer:       ImageMagick 6.8.9-9 Q16 x86_64 2015-01-06 http://www.imagemagick.org
 CreationDate:   Sun May 24 00:21:28 2015
 ModDate:        Sun May 24 00:21:28 2015
 Tagged:         no
 UserProperties: no
 Suspects:       no
 Form:           none
 JavaScript:     no
 Pages:          1
 Encrypted:      no
 Page size:      500 x 500 pts
 Page rot:       0
 MediaBox:           0.00     0.00   500.00   500.00
 CropBox:            0.00     0.00   500.00   500.00
 BleedBox:           0.00     0.00   500.00   500.00
 TrimBox:            0.00     0.00   500.00   500.00
 ArtBox:             0.00     0.00   500.00   500.00
 File size:      12343 bytes
 Optimized:      no
 PDF version:    1.4

In these cases you must look into the PDF source code, or run:

for i in *.pdf ; do
   echo $i;
   echo -n "  ";
   grep -a -o --color -P "/.*?Box.*?]" "$i" ;
   echo ;
done

NotWorking.pdf
  /MediaBox [0 0 500 500]
  /CropBox [0 0 500 500]

Working.pdf
  /MediaBox [ 0 0 500 500 ]

out(NotWorking).pdf
  /Type/Page/MediaBox [0 0 300 400]
  /CropBox [0 50.0 300.0 350.0]

out(Working).pdf
  /Type/Page/MediaBox [0 0 300 400]

As you can see, the file NotWorking.pdf did have its own explicit /CropBox value pre-set already...

One more caveat, be warned:

My grep command given above will not discover the /CropBox setting in cases where the respective PDF object is obscured by being embedded into an /ObjStm object ("object stream").

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • Actually, you **can** override the per page CropBox, you just need to define the new one after Ghostscript has processed any existing one from the PDF file. So it ought to work just fine in an EndPage procedure, because the CropBox for the original PDF page is processed right at the start of the page. So doing one at the end of the page ought to override the previous one. It won't work for a BeginPage though. – KenS May 24 '15 at 19:21
  • @Kurt Thanks for this answer. Ken's answer solved it for me, but you have given me a few ideas on how to improve my program. I might give an option to disable the CropBox, or to maintain the size relationship between the CropBox and MediaBox. I'd upvote but don't enough rep on this SE yet. – HullCityFan852 May 25 '15 at 01:44