10

I have a few hundred PDFs that I need to crop - I'm willing to either crop the actual documents or simply add a crop box to each so the correct viewable area shows when the PDF is opened.

How can I do this using Ghostscript (v8.71)? I found this:

gs -sDEVICE=pdfwrite -sOutputFile=marked.pdf [/CropBox [54 54 1314 810] /PAGES pdfmark original.pdf

I've tried this (and all variants I can think of) but I always get an error such as this:

Error: /undefinedfilename in ([/CropBox)

I've tried moving around the parameters of the command but nothing seems to work. Does anyone know how this can be accomplished?

Update: Still no crop box after correcting syntax, see results -

Results for: pdfinfo -box -f 1 -l 3 original.pdf

Producer:       PDFlib 7.0.2 (PHP5/Linux)
CreationDate:   Wed Oct 21 11:41:04 2009
ModDate:        Wed Oct 21 13:38:22 2009
Tagged:         no
Pages:          1
Encrypted:      no
Page    1 size: 1423 x 918 pts
Page    1 MediaBox:     0.00     0.00  1423.00   918.00
Page    1 CropBox:      0.00     0.00  1423.00   918.00
Page    1 BleedBox:    54.00    54.00  1369.00   864.00
Page    1 TrimBox:      0.00     0.00  1423.00   918.00
Page    1 ArtBox:       0.00     0.00  1423.00   918.00
File size:      914373 bytes
Optimized:      no
PDF version:    1.4`


Results for: pdfinfo -box -f 1 -l 3 marked.pdf

Producer:       GPL Ghostscript 8.71
CreationDate:   Wed Apr 27 15:43:38 2011
ModDate:        Wed Apr 27 15:43:38 2011
Tagged:         no
Pages:          1
Encrypted:      no
Page    1 size: 1423 x 918 pts
Page    1 MediaBox:     0.00     0.00  1423.00   918.00
Page    1 CropBox:      0.00     0.00  1423.00   918.00
Page    1 BleedBox:     0.00     0.00  1423.00   918.00
Page    1 TrimBox:      0.00     0.00  1423.00   918.00
Page    1 ArtBox:       0.00     0.00  1423.00   918.00
File size:      392382 bytes
Optimized:      no
PDF version:    1.4

Update: Example PDFs posted -

able_to_crop.pdf
cannot_crop.pdf

Keith Pinson
  • 7,835
  • 7
  • 61
  • 104
Brian
  • 2,107
  • 6
  • 22
  • 40
  • Your Ghostscript command did "work", creating new output -- but it has ignored your (wrong) pdfmark parameters (which it tried to interpete as filenames passed to it). In the output file Ghostscript made all "Boxes" the same. – Kurt Pfeifle Apr 27 '11 at 16:12

1 Answers1

10

You are on the right track, trying to use pdfmark/Ghostscript for adding a CropBox. But your syntax isn't 100% correct.

Try this instead:

 gs \
  -sDEVICE=pdfwrite \
  -o marked.pdf \
  -c "[/CropBox [54 54 1314 810] /PAGES pdfmark" \
  -f original.pdf
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • Thanks - now I can run the command, but the resulting PDF doesn't actually show any crop box. Also, the original.pdf has a bleed box to start with, and that's gone too. Any ideas? – Brian Apr 27 '11 at 14:18
  • @Brian: Maybe your idea of what a *CropBox* in a PDF is should come in line with the definition of CropBox in the PDF specification document? – Kurt Pfeifle Apr 27 '11 at 14:40
  • @Brian: A CropBox is that part of the PDF page which gets shown by all conforming PDF viewers by default... – Kurt Pfeifle Apr 27 '11 at 14:42
  • @Brian: In Acrobat Reader, you can enable the display of ArtBox, TrimBox and BleedBox as thin green, red and blue lines by going *"Preferences -> Page Display -> Page Content and Information"* and enabling the respective checkmarks. But if the CropBox is the innermost of all these boxes, you won't "see" any of these lines either. – Kurt Pfeifle Apr 27 '11 at 14:46
  • 1
    @Brian: Show me the output of this command: `pdfinfo -box -f 1 -l 3 original.pdf` and I'll tell you some example values you can use for your `gs` command to achieve a real CropBox effect... – Kurt Pfeifle Apr 27 '11 at 14:48
  • @pipitas: I've updated my question with the pdfinfo results.. Perhaps my comment was misphrased - I know about the various boxes and such - before posting, I had also confirmed the missing crop box via Preflight. Thanks for your help, any examples would be much appreciated. – Brian Apr 27 '11 at 15:55
  • @Brian: `gs -sDEVICE=pdfwrite -sOutputFile=marked.pdf -c "[/CropBox [60 60 1363 858] /PAGES pdfmark" -f original.pdf` will *'crop'* away 60 points (==5/6 of an inch) on all 4 borders... – Kurt Pfeifle Apr 27 '11 at 16:09
  • @pipitas: I wish I could figure out why, but the resulting PDF still does not contain a crop box. Using your exact syntax on the command line, it executes successfully and leaves me at the GS> prompt, where I then type 'quit' and then run pdfinfo on the marked.pdf file... no crop box. Any ideas? Just speculating, but does the presence of the initial BleedBox affect anything? It's curious that in attempting to add the CropBox, the BleedBox disappears. – Brian Apr 27 '11 at 16:52
  • @Brian: without access to the original PDF you're using I cannot give any further comments, not even speculative ones. – Kurt Pfeifle Apr 27 '11 at 19:45
  • 1
    @Brian: if you use `-o marked.pdf` you can avoid the `GS>`-prompting. – Kurt Pfeifle Apr 27 '11 at 19:51
  • @pipitas: I tried a test where I saved this webpage as a PDF, and was able to add the CropBox - but then I simply opened that same PDF in Acrobat Pro and saved it with a new filename, and was NOT able to add the CropBox to that new file. I posted each PDF publicly on Google Docs and updated the question with links to both - if you felt like having a look, you'll be able to download them from there. Thanks again for your help. – Brian Apr 27 '11 at 22:40
  • Well the PDF from Acrobat is different in several ways. Linearized. Different PDF version. ALREADY HAS CROP BOXES. I suspect GS isn't replacing the current value successfully. I'd like to see your bad output. – Mark Storer Apr 28 '11 at 22:50
  • I'm going to guess that your output has two /CropBox entries in its page dictionaries, and that yours is second, losing out to what was already there. – Mark Storer Apr 28 '11 at 23:21
  • @Mark: Is there a way I can remove any existing crop boxes first? I'm still finding it all a bit inconsistent though - I puropsely did not set a crop box on these documents. When I check for the crop box with `pdfinfo -box` it displays coords matching the doc size (see output pasted in question) - when I check in Preflight, it shows all zeroes. – Brian Apr 29 '11 at 13:26
  • As a follow-up: I found that my issue was only occurring with PDFs containing images - documents with text only were able to be cropped successfully. I eventually got things working by converting to Postscript first, then adding the crop box, then converting back to PDF. Good info about this stuff is hard to come by, so any insight would be welcomed. – Brian Apr 29 '11 at 13:31
  • Clarification to above comment: of the PDFs that **I** created (via PDFlib), those without images were able to be cropped. – Brian Apr 29 '11 at 13:43
  • @Brian: I run my commandline against your 'cannot-crop.pdf' and looked at result in a text editor. 'cannot-crop.pdf' already has CropBoxes of [0 0 612 792] defined for each page individually (`/Type /Page`). Ghostscript applied its CropBox statement to the PDF root object defining `/Type /Pages`. That's why it doesn't work here. I would consider to file a bug report at http://bugs.ghostscript.com/ and see if they are willing to fix it... – Kurt Pfeifle Apr 29 '11 at 15:17
  • @pipitas: Thanks, I'll submit the report to Ghostscript. For the time being, is there any way to remove the crop boxes for the individual pages so that the CropBox set by Ghostscript will work? – Brian Apr 29 '11 at 15:30
  • @Brian: Other than hand-editing the file, I can't think of any. ***IF*** your PDFs look all the same or similar as your *cannot-crop.pdf', you could apply my commandline, and then on the result run a `sed 's#/CropBox [0 0 612.0 792.0]#/CropBox [100 100 512 692]#'`... – Kurt Pfeifle Apr 29 '11 at 18:17
  • 1
    @Brian: Note, that it is real good luck if you can get the number of characters that replace the original CropBox definition to be the same as the original (as in my case above). You can insert as many spaces as you want in the replacement, or you could take away the space after `/CropBox [` to make it `/CropBox[` without a problem. But if you add to the total no. of characters, you will see *'File is corrupted and needs to rebuild the Xref section'*-messages... – Kurt Pfeifle Apr 29 '11 at 18:21
  • 1
    Yeah... pdfmark doesn't seem to be able to modify anything that already exists, it can only add to what's already there. You COULD do a "page" pdfmark instead of a "pages" pdfmark, but you'd need to know how many pages there were in the first place. OTOH, PS is a full programming language, so you could conceivably write something that would check the page count and Work Correctly. In Theory. – Mark Storer May 02 '11 at 16:56
  • 1
    answer to [Cropping a PDF using Ghostscript 9.01](http://stackoverflow.com/questions/6183479/cropping-a-pdf-using-ghostscript-9-01) says that command line argument `/CropBox` is ignored if it's already defined in the file. Workaround is to preprocess the file and change case, e.g. `/crOPbOX` so the internal setting is ignored. – matt wilkie Apr 05 '16 at 16:02