0

I have the need to convert any multipage PDF file into a set of JPGs.

Since the PDF files are supposed to come from a scanner, we can assume each page just contains a graphic object to extract, but I cannot be 100% sure of that.

So, I need to convert any renderable content from each page into a single JPEG file.

How can I do this with iText?

If I can't do this with iText, what Java library can achieve this?

Thanks.

usr-local-ΕΨΗΕΛΩΝ
  • 26,101
  • 30
  • 154
  • 305

4 Answers4

2

Ghostscript (available for Windows, Linux, MacOS X, Solaris, AIX,...) can convert...

  • ...from input formats: PDF, PostScript, EPS and AI
  • ...into output formats: JPEG, TIFF, PNG, PNM, PPM, BMP, (and more).

(The ImageMagick mentioned above doesn't do the conversion on its own -- it uses Ghostscript under the hood, as do many other tools.)

Kurt Pfeifle
  • 86,724
  • 23
  • 248
  • 345
  • iText uses Ghostscript and ImageMagick under the hood in the tests. Ghostscript to convert PDF into PNG files, and then ImageMagick to compare the PNG files. Check the `CompareTool ` class in the iText source code for more information. – Amedee Van Gasse Jun 22 '17 at 08:05
1

ICEpdf - http://www.icepdf.org/ - has an open source entry version which should do what you need.

I believe the primary difference between the open source version and the pay-for version is that the pay-for has much better font support.

Thorbjørn Ravn Andersen
  • 73,784
  • 33
  • 194
  • 347
  • Seems feasible. At least the home page advertises this feature. Do you know if there is any quick start on JPEG rendering so we can make a raw proof of concept in order to request the official import of the package into the project? (I hate bureocracy but "dura lex sed lex") – usr-local-ΕΨΗΕΛΩΝ Jun 21 '11 at 09:12
  • I looked at the project a while back, but we went a different path so I do not have first hand experience with it. I would suggest registering and download their stuff, to see if any of the demos listed on their web page (which render images) are included. – Thorbjørn Ravn Andersen Jun 21 '11 at 09:22
1

You can also use Sun's PDF-Renderer and JPedal does PDF to image (low and high res.

mark stephens
  • 3,205
  • 16
  • 19
1

With Apache PDFBox you could do the following:

PDDocument document = PDDocument.load(pdffile);
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
for (int i = 0; i < pages.size(); i++) {
  PDPage page = pages.get(i);
  BufferedImage image = page.convertToImage(BufferedImage.TYPE_INT_RGB, 72);
  ImageIO.write(image, "jpg", new File(pdffile.getAbsolutePath() + "_" + i + ".jpg"));
}
Christof Aenderl
  • 4,233
  • 3
  • 37
  • 48