7

I need to crop a certain section in my PDF file to PNG (this will be automated using Ghostscript with PHP). This is what i do now which basically turns the first page of a PDF to PNG:

gs -q -dNOPAUSE -dBATCH \
   -sDEVICE=pngalpha -dEPSCrop \
   -sOutputFile=output.png input.pdf

Specifically, i'm trying to crop this top left card to a PNG. I'm also open for more suggestions on how to accomplish this.

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
Tom
  • 9,275
  • 25
  • 89
  • 147

2 Answers2

20

First,
determine the bounding box of your first PDF page:

gs                          \
 -q                         \
 -dBATCH                    \
 -dNOPAUSE                  \
 -sDEVICE=bbox              \
 -dLastPage=1               \
  stackoverflowQuestion.pdf \
2>&1                        \
| grep %%BoundingBox

The resulting output will be:

%%BoundingBox: 119 531 464 814

It means:

  • the lower left corner of the bounding box is at coordinate (119,531)
  • the upper right corner of the bounding box is at coordinate (464,814)

The values are in PostScript points (where 72 pt == 1 inch) . The bounding box is that rectangle, which includes these graphical PDF objects that leave ink or toner marks on a page.

Then,
create your PNG.

Deriving from the bounding box value, you seem to want it 345 pt wide (= 464 - 119) and 283 pt high (= 814 - 531). This leads to a pages size of -g345x283 (given in pixels, because Ghostscript uses by default 72 dpi for image output (unless specified otherwise), and therefor 72 px == 1 inch.

Or better, we keep a security zone of 1 pt away from the bounding box, so we make the image a bit bigger than the bare minimum and we get this image dimension: -g347x285.

You also need to cut off 119 pt from the left edge (118 pt for 'security') and 531 pt from the bottom edge (530 for security).

Hence the command would be:

gs                                                      \
  -o out.png                                            \
  -sDEVICE=pngalpha                                     \
  -g347x285                                             \
  -dLastPage=1                                          \
  -c "<</Install {-118 -530 translate}>> setpagedevice" \
  -f stackoverflowQuestion.pdf 

Here is the resulting PNG:

out.png

For a better PNG quality, increase the resolution from the default 72 dpi to 720 dpi and use this command:

gs                                                      \
  -o out720dpi.png                                      \
  -sDEVICE=pngalpha                                     \
  -r720                                                 \
  -g3470x2850                                           \
  -dLastPage=1                                          \
  -c "<</Install {-118 -530 translate}>> setpagedevice" \
  -f stackoverflowQuestion.pdf 

Update:

On Windows in a CMD window, the console application names for Ghostscript are gswin32c.exe and/or gswin64c.exe (instead of gs). Also, you'd have to use ^ as a line continuation character (instead of \).

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • Wow never seen a detailed answer like this before! thank you so much Kurt – Tom Sep 19 '12 at 07:38
  • @Tom: Look at my other answers -- I frequently do it like that. ;-) – Kurt Pfeifle Sep 19 '12 at 07:50
  • Just to be on the safe side, isn't (119,531) the upper left and (464,814) lower right? – Tom Sep 19 '12 at 08:46
  • 1
    @Tom: No. PostScript's (and PDF's) graphic model start their coordinate system from the lower left corner. (And I did really run those commands in my answer to get the resulting picture.) – Kurt Pfeifle Sep 19 '12 at 09:19
  • that is very interesting Kurt. I'm now digging in the man page to learn more, thank you! – Tom Sep 19 '12 at 15:21
  • Ah... I don't think there is a good "man page" for Ghostscript. Better start at [Use.htm](http://git.ghostscript.com/?p=ghostpdl.git;a=blob_plain;f=gs/doc/Use.htm), [Ps2pdf.htm](http://git.ghostscript.com/?p=ghostpdl.git;a=blob_plain;f=gs/doc/Ps2pdf.htm) and [Devices.htm](http://git.ghostscript.com/?p=ghostpdl.git;a=blob_plain;f=gs/doc/Devices.htm). – Kurt Pfeifle Sep 19 '12 at 19:07
  • How to get PowerShell to write the bounding box to the console? – Alan Jul 02 '16 at 15:18
  • @Alan: Comments are the wrong location to ask new questions! Also, the question you ask is the wrong one. You should rather ask: *"How to run an external CLI command (which works in CMD) in PowerShell?"*. And you should ask it as your own, separate question.... – Kurt Pfeifle Jul 02 '16 at 15:22
  • @KurtPfeifle I don't see why it's a separate question. The original question did not specify bash. I'm just asking you to elaborate the first paragraph of your answer so that it is not so platform dependent. (I already understand that I can replace `grep` with `select-string`.) – Alan Jul 02 '16 at 17:52
  • @Alan: The original question did specify as the command name `gs`. This indirectly also ***does*** specify Bash (because `gs` is Ghostscript's executable on Linux, Unix and Mac OS X). My concession to you is the update to my answer from 3 hours ago. – Kurt Pfeifle Jul 02 '16 at 18:54
  • I'm wondering why there isn't just a simple method to specify the crop area in pdf points – ceztko May 31 '18 at 22:56
  • @ceztko: Then tell me please if you find one. It should work on Windows, macOS, Linux and Unix, and preferrably open source... – Kurt Pfeifle Jun 01 '18 at 12:24
  • @KurtPfeifle you answered all the related questions here in SO, so if you don't know this is really a matter of getting support from ghostscript devs. They are here in SO as well. Maybe they will answer to a specific question on this, if there's actually a method to crop a pdf using units instead requiring manual conversion to pixel using `-g` and `-r` and specifying the `/Install` directive for the translation, for which I couldn't find the documentation anywhere. Works anyway, thanks. – ceztko Jun 01 '18 at 13:04
0

On Windows the console application names for Ghostscript are gswin32c.exe and/or gswin64c.exe (instead of gs).

1. CMD window

In a CMD window you have to use ^ as a line continuation character (instead of \). Also, grep may not be available -- use findstr instead. Last, if gswinXX.exe is not in your %PATH%, and if the full path contains a space, you have to quote it:

"c:\program files\ghostscript\gswin64c.exe" ^
 -q                         ^
 -dBATCH                    ^
 -dNOPAUSE                  ^
 -sDEVICE=bbox              ^
 -dLastPage=1               ^
  stackoverflowQuestion.pdf ^
| findstr %%BoundingBox

2. PowerShell window

In a PowerShell window, just quoting the full path to the executable will not work. You have to run:

& "c:\program files\ghostscript\gswin64c.exe" -q -o nul: -sDEVICE=bbox my.pdf
Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345