4

I've written a script to use ImageMagick to convert PDFs into JPGs for each page, along with resizing/etc.

Where it gets slightly more tricky is some PDFs have the middle two-page spread as "one page" - so it is extra wide. Is there ANY way to "detect" this and crop the left and right sides, as two separate pages?

Kaitlyn2004
  • 3,879
  • 4
  • 16
  • 16

1 Answers1

4

Assuming you want to use ImageMagick (and only ImageMagick) for this: that can't be done. ImageMagick cannot process PDF input all by itself. It has to make use of Ghostscript anyway, so without a local Ghostscript installation it won't work. (You will not necessarily see Ghostscript at work while you feed PDF input to ImageMagick, unless you add a -verbose to its command line, because ImageMagick's delegation of the job to Ghostscript happens behind your back...)

Your question has two parts:

  • "Is there a way to "detect" extra wide pages, like the center spreads?"
  • "Is there a way to crop the left and right parts from center spreads as two separate pages?"

Detect page sizes

You can use ImageMagick's identify to detect the page sizes of a PDF.

Just run the most simple command:

identify multipage.pdf

The output will be s.th. like

multipage.pdf[0] PDF 595x792 595x792+0+0 16-bit Bilevel DirectClass 59.5KB 0.000u 0:00.000
multipage.pdf[1] PDF 595x792 595x792+0+0 16-bit Bilevel DirectClass 59.5KB 0.000u 0:00.000
multipage.pdf[2] PDF 595x792 595x792+0+0 16-bit Bilevel DirectClass 59.5KB 0.000u 0:00.000
multipage.pdf[3] PDF 595x792 595x792+0+0 16-bit Bilevel DirectClass 59.5KB 0.000u 0:00.000

The output's page count is 0-based. So [0] indicates the first page, [1] the second page, etc.

To customize the output a bit better, you could do this:

identify -format '%f, page %s + 1: %W x %H\n' multipage.pdf

and get

multipage.pdf, page  0 + 1: 595 x 792
multipage.pdf, page  1 + 1: 595 x 792
multipage.pdf, page  2 + 1: 595 x 792
multipage.pdf, page  3 + 1: 595 x 792

For a double-spread page the respective output should be 1190 x 792 or similar.

However, be warned: to use ImageMagick for querying the page sizes of PDF files is veeeery slow. Therefor, better use a different tool for this sub-task: pdfinfo. This will be faster by several orders of magnitude:

pdfinfo -f 1 -l 1000 -box multipage.pdf

will output

Pages:          4
Page    1 size: 595 x 792 pts
Page    1 rot:  0
Page    2 size: 595 x 792 pts
Page    2 rot:  0
Page    3 size: 595 x 792 pts
Page    3 rot:  0
Page    4 size: 595 x 792 pts
Page    4 rot:  0

If you need additional info about the pages' ArtBox, TrimBox, BleedBox and CropBox values, just add -box to the commandline.

As I said: pdfinfo is significantly faster in identifying page sizes for PDFs than ImageMagick is. Use the right tool for the job.

Crop left and right parts of a page

Now that you have identified the large double-spread page, you could use one of the following methods (based on Ghostscript) to split down the pages in the middle:

Adapting the method described in above links will result in 2 PDF pages that still contain all their original vector and font info.

Alternatively, you can use ImageMagick. Assuming your 'double-spread' page is of dimension 1190x842 pt, based on A4 (595x842 pt), and assuming it is page 16 (which translates to [15] for ImageMagick) inside an original PDF, your convert commands could be s.th. like:

convert  multipage.pdf[15]  -crop 595x842+0+0    page16-left.png
convert  multipage.pdf[15]  -crop 595x842+595+0  page16-right.png

The result gives you two raster images.

Community
  • 1
  • 1
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345