I want to extract an area given by x-y coordinates from a pdf page. The extracted area may be stored as a page in a new pdf document. This needs to be done several times and so I would want the process to be scripted. Are there any tools / libraries that can help do this?
-
With freely available (some subject to AGPL, some to LGPL) tools (actually Java libraries) I only have an idea how to make an existing page an invisible PDF XObject from which some parts (given by coordinates and size) can be displayed on a new PDF page. This implies, though, that the whole original page is present in the resulting PDF and (even though not visible) can be accessed by someone knowing his way around in PDF files. Would such a method work for you? – mkl Mar 31 '13 at 10:12
-
Sure, I think you should post your answer. The answer I have will also probably retain all the contents. – r.v Mar 31 '13 at 18:04
3 Answers
If iText (for Java) or iText(Sharp) (for .Net) are acceptable libraries for you, you can use them to import an existing page from some PDF as a template of which sections can be displayed in another PDF.
Have a look at the example TilingHero.java / TilingHero.cs from chapter 6 of iText in Action — 2nd Edition. The central code is:
PdfImportedPage page = writer.getImportedPage(reader, 1);
// adding the same page 16 times with a different offset
float x, y;
for (int i = 0; i < 16; i++) {
x = -pagesize.getWidth() * (i % 4);
y = pagesize.getHeight() * (i / 4 - 3);
content.addTemplate(page, 4, 0, 0, 4, x, y);
document.newPage();
}
As you see, the original page is imported once and different sections of it are displayed on different pages.
(iText and iTextSharp are available either for free --- subject to the AGPL --- or commercially)

- 90,588
- 15
- 125
- 265
You may use 'pdftoppm' to do this task:
pdftoppm -f <first page> -l <last page> -jpeg -x <start x> -y <start y> -W <width> -H <height> -jpeg <in file> > <out file>
For exaple, crop the area of the first PDF page from point (x,y) = (100,200), which is the upper left corner of your crop area, with a width of 50 and a height of 80 and save it to a JPEG file is done by using:
pdftoppm -f 1 -l 1 -jpeg -x 100 -y 200 -W 50 -H 80 'my.pdf' > 'crop.jpg'
If you get in trouble with your documents resolution, you can use the '-r' option of 'pdftoppm' (see the man page of 'pdftoppm' for more).
Certainly, you can easily convert the JPEG file into a PDF, if needed.

- 46
- 3
Using ghostscript
, you can crop the pdf the following way:
gs -f original.pdf -o final.pdf -sDEVICE=pdfwrite \
-c "[/CropBox [x-left y-bottom x-right y-top] /PAGES pdfmark"
x-left
, y-bottom
, etc., coordinates may be substituted with the required coordinates. Note that for gs
, coordinates (0, 0)
are at the left-bottom of the page.
This can then be easily scripted.

- 4,697
- 6
- 35
- 57