17

Is it possible to get the page size (from e.g. a PDF document page) using GhostScript? I have seen the "bbox" device, but it returns the bounding box (it differs per page), not the TrimBox (or CropBox) of the PDF pages. (See http://www.prepressure.com/pdf/basics/page_boxes for info about page boxes.) Any other possibility?

Aristoteles
  • 708
  • 2
  • 7
  • 15

3 Answers3

13

Unfortunately it doesn't seem quite easy to get the (possibly different) page sizes (or *Boxes for that matter) inside a PDF with the help of Ghostscript.

But since you asked for other possibilities as well: a rather reliable way to determine the media sizes for each page (and even each one of the embedded {Trim,Media,Crop,Bleed}Boxes) is the commandline tool pdfinfo.exe. This utility is part of the XPDF tools from http://www.foolabs.com/xpdf/download.html . You can run the tool with the "-box" parameter and tell it with "-f 3" to start at page 3 and with "-l 8" to stop processing at page 8.

Example output:

C:\downloads>pdfinfo -box -f 1 -l 3 _IXUS_850IS_ADVCUG_EN.pdf
Creator:        FrameMaker 6.0
Producer:       Acrobat Distiller 5.0.5 (Windows)
CreationDate:   08/17/06 16:43:06
ModDate:        08/22/06 12:20:24
Tagged:         no
Pages:          146
Encrypted:      no
Page    1 size: 419.535 x 297.644 pts
Page    2 size: 297.646 x 419.524 pts
Page    3 size: 297.646 x 419.524 pts
Page    1 MediaBox:     0.00     0.00   595.00   842.00
Page    1 CropBox:     87.25   430.36   506.79   728.00
Page    1 BleedBox:    87.25   430.36   506.79   728.00
Page    1 TrimBox:     87.25   430.36   506.79   728.00
Page    1 ArtBox:      87.25   430.36   506.79   728.00
Page    2 MediaBox:     0.00     0.00   595.00   842.00
Page    2 CropBox:    148.17   210.76   445.81   630.28
Page    2 BleedBox:   148.17   210.76   445.81   630.28
Page    2 TrimBox:    148.17   210.76   445.81   630.28
Page    2 ArtBox:     148.17   210.76   445.81   630.28
Page    3 MediaBox:     0.00     0.00   595.00   842.00
Page    3 CropBox:    148.17   210.76   445.81   630.28
Page    3 BleedBox:   148.17   210.76   445.81   630.28
Page    3 TrimBox:    148.17   210.76   445.81   630.28
Page    3 ArtBox:     148.17   210.76   445.81   630.28
File size:      6888764 bytes
Optimized:      yes
PDF version:    1.4
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
11

A solution in pure GhostScript PostScript, no additional scripts necessary:

gs -dQUIET -sFileName=path/to/file.pdf -c "FileName (r) file runpdfbegin 1 1 pdfpagecount {pdfgetpage /MediaBox get {=print ( ) print} forall (\n) print} for quit"

The command prints the MediaBox of each page in the PDF as four numbers per line. An example from a 3-page PDF:

0 0 595 841
0 0 595 841
0 0 595 841

Here's a breakdown of the command:

FileName (r) file  % open file given by -sFileName
runpdfbegin        % open file as pdf
1 1 pdfpagecount { % for each page index
  pdfgetpage       % get pdf page properties (pushes a dict)
  /MediaBox get    % get MediaBox value from dict (pushes an array of numbers)
  {                % for every array element
    =print         % print element value
    ( ) print      % print single space
  } forall
  (\n) print       % print new line
} for
quit               % quit interpreter. Not necessary if you pass -dBATCH to gs

Replace /MediaBox with /CropBox to get the crop box.

Stefan Dragnev
  • 14,143
  • 6
  • 48
  • 52
  • 2
    Nice answer! And Nice breakthrough! I was getting an error here though (cannot open X display `:0.0`), which can be fixed by opening an X server, or by adding `-dNODISPLAY` to the call (better, since we don't need X anyways). – Gus Neves Oct 04 '18 at 15:56
  • 1
    This command fails if your pages are rotated. It outputs width for height and vice versa. – Behlül Jul 30 '20 at 22:32
  • It is giving me this error. ``` Error: /invalidfileaccess in --file-- Operand stack: (Chapter10.pdf) (r) Execution stack: %interp_exit .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- --nostringval-- --nostringval-- false 1 %stopped_push .runexec2 --nostringval-- --nostringval-- --nostringval-- 2 %stopped_push --nostringval-- Dictionary stack: --dict:729/1123(ro)(G)-- --dict:0/20(G)-- --dict:75/200(L)-- Current allocation mode is local Last OS error: Permission denied ``` – Atif Ali Jan 10 '22 at 12:06
  • What unit of measurements are the numbers? – robertspierre Jan 15 '22 at 13:59
  • @AtifAli you need option `-dNOSAFER` – robertspierre Jan 16 '22 at 08:02
10

Meanwhile I found a different method. This one uses Ghostscript only (just as you required). No need for additional third party utilities.

This method uses a little helper program, written in PostScript, shipping with the source code of Ghostscript. Look in the toolbin subdir for the pdf_info.ps file.

The included comments say you should run it like this in order to list fonts used, media sizes used

gswin32c -dNODISPLAY ^
   -q ^
   -sFile=____.pdf ^
   [-dDumpMediaSizes] ^
   [-dDumpFontsUsed [-dShowEmbeddedFonts]] ^
   toolbin/pdf_info.ps

I did run it on a local example file, with commandline parameters that ask for the media sizes only (not the fonts used). Here is the result:

C:\> gswin32c ^
      -dNODISPLAY ^
      -q ^
      -sFile=c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf ^
      -dDumpMediaSizes ^
      C:/gs8.71/lib/pdf_info.ps


  c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf has 146 pages.
  Creator: FrameMaker 6.0
  Producer: Acrobat Distiller 5.0.5 (Windows)
  CreationDate: D:20060817164306Z
  ModDate: D:20060822122024+02'00'

  Page 1 MediaBox: [ 595 842 ] CropBox: [ 419.535 297.644 ]
  Page 2 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  Page 3 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  Page 4 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  [....]
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • Does ghostscript still ship with `pdf_info.ps`? If not, where would be a good place to get a copy? –  Mar 31 '14 at 18:03
  • 2
    You can look for it in Ghostscript's Git repository: [http://git.ghostscript.com/?p=ghostpdl.git;a=summary](http://git.ghostscript.com/?p=ghostpdl.git;a=summary). Or try **[this direct link](http://git.ghostscript.com/?p=ghostpdl.git;a=blob_plain;f=gs/toolbin/pdf_info.ps;hb=HEAD)**. – Kurt Pfeifle Apr 01 '14 at 11:06
  • Thanks! I'd found a copy somewhere, but I don't think it was as up to date. –  Apr 01 '14 at 13:07
  • 3
    Can no longer find it in the git repo - at least not via google. And, on ubuntu the /usr/share/ghostscript/9.18/lib directory does not contain it. Is there are alternative? (Alternative location or program?) – Diagon Oct 10 '16 at 19:20
  • 1
    @Diagon It looks like the file is still available in the git repository. If you go to "tree" view on the repo, then navigate to the "toolbin" folder you will find it in there. – blendenzo Mar 22 '17 at 19:20
  • Ah yes, there it is. Thank you. – Diagon Mar 25 '17 at 06:41
  • 1
    The file `pdf_info.ps` has apparently be moved into the `[ghostpdl.git]/lib` subfolder – robertspierre Jan 16 '22 at 07:57
  • Note you need option `-dNOSAFER` to open a file from command line too – robertspierre Jan 16 '22 at 08:02