6

I am searching for a open-source java-library that enables me to render single pages of PDFs as JPG or PNG on server-side.

Unfortunately it mustn't use any other java.awt.* classes then

  • java.awt.datatransfer.DataFlavor
  • java.awt.datatransfer.MimeType
  • java.awt.datatransfer.Transferable

If there is any way, a little code-snippet would be fantastic.

  • 1
    http://stackoverflow.com/questions/11513841/appengine-conversion-api-java shows how to do it with the Google conversions api. *But* there is one problem. This api will be removed in November. Maybe you can ask Google for tips on any alternative. – halex Aug 31 '12 at 19:25
  • 1
    Yes, I had seen that. But like you wrote the support will soon be discontinued. Otherwise it would have been perfect. I'll try to get some information from google. – Bommelmutze Aug 31 '12 at 20:41
  • Hi, did you find anything else that does the same conversion? I'm also looking for a similar functionality. I know I can request images from pdf's smaller then 25Mb using google drive. But I need it to work for bigger files. – DavidVdd Feb 27 '13 at 12:55

2 Answers2

0

i believe icepdf might have what you are looking for.

I've used this open source project a while back to turn uploaded pdfs into images for use in an online catalog.

import org.icepdf.core.exceptions.PDFException;
import org.icepdf.core.exceptions.PDFSecurityException;
import org.icepdf.core.pobjects.Document;
import org.icepdf.core.pobjects.Page;
import org.icepdf.core.util.GraphicsRenderingHints;


public byte[][] convert(byte[] pdf, String format) {

    Document document = new Document();
    try {
        document.setByteArray(pdf, 0, pdf.length, null);

    } catch (PDFException ex) {
        System.out.println("Error parsing PDF document " + ex);
    } catch (PDFSecurityException ex) {
        System.out.println("Error encryption not supported " + ex);
    } catch (FileNotFoundException ex) {
        System.out.println("Error file not found " + ex);
    } catch (IOException ex) {
        System.out.println("Error handling PDF document " + ex);
    }
    byte[][] imageArray = new byte[document.getNumberOfPages()][];
    // save page captures to bytearray.
    float scale = 1.75f;
    float rotation = 0f;

    // Paint each pages content to an image and write the image to file
    for (int i = 0; i < document.getNumberOfPages(); i++) {
        BufferedImage image = (BufferedImage)
                document.getPageImage(i,
                                      GraphicsRenderingHints.SCREEN,
                                      Page.BOUNDARY_CROPBOX, rotation, scale);
       try {
            //get the picture util object
            PictureUtilLocal pum = (PictureUtilLocal) Component
            .getInstance("pictureUtil");
            //load image into util
            pum.loadBuffered(image);

            //write image in desired format
            imageArray[i] = pum.imageToByteArray(format, 1f);

            System.out.println("\t capturing page " + i);

        } catch (IOException e) {
            e.printStackTrace();
        }
        image.flush();
    }
    // clean up resources
    document.dispose();
    return imageArray;
}

Word of caution though, I have had trouble with this library throwing a SegFault on open-jdk. worked fine on Sun's. Not sure what it would do on GAE. I can't remember what version it was that had the problem so just be aware.

natedennis
  • 117
  • 8
  • No clue. but while their down voting it, I've been running it in production for the last 4 years. No problems at all. – natedennis Mar 07 '15 at 14:05
  • out of curiosity, have you used pdf-renderer ? I was having issues converting a single page of PDF to PNG using Apache PdfBox, but pdf-renderer seemed to fix it [doing similar to this post](http://stackoverflow.com/questions/19018709/pdfrenderer-export-to-image-exported-inaccurately). I don't hear it talked about much, so am concerned I'm missing some issues/downsides to it. – Don Cheadle Mar 09 '15 at 14:56
  • I have not. I wasn't aware of it.. actually i wrote the first revision of the above this code in 2010. pdf-renderer wasnt started till a year later. Might be a good project to play with. I'm a programmer.. i'm always interested in a better way. "Pdf-renderer is a subproject of Swinglabs, was started in January 2011 and has 571 members. The project administrators are rbair, tomoke, joshy, and Jan Haderka." – natedennis Mar 10 '15 at 16:16
  • Heh so I take it you don't have an opinion of it either way? – Don Cheadle Mar 10 '15 at 16:17
0

You can apache PDF box APi for this purpose and use following to code to convert two pdfs into JPG page by page .

public  void convertPDFToJPG(String src,String FolderPath){

           try{
               File folder1 = new File(FolderPath+"\\");
               comparePDF cmp=new comparePDF();
               cmp.rmdir(folder1);

           //load pdf file in the document object
           PDDocument doc=PDDocument.load(new FileInputStream(src));
           //Get all pages from document and store them in a list
           List<PDPage> pages=doc.getDocumentCatalog().getAllPages();
           //create iterator object so it is easy to access each page from the list
           Iterator<PDPage> i= pages.iterator();
           int count=1; //count variable used to separate each image file
           //Convert every page of the pdf document to a unique image file
           System.out.println("Please wait...");
           while(i.hasNext()){
            PDPage page=i.next(); 
            BufferedImage bi=page.convertToImage();
            ImageIO.write(bi, "jpg", new File(FolderPath+"\\Page"+count+".jpg"));
            count++;
            }
           System.out.println("Conversion complete");
           }catch(IOException ie){ie.printStackTrace();}
          }
fahad
  • 389
  • 2
  • 12
  • The OP clearly indicated that he needs a solution for the "Google App Engine" (GAE). Current PDFBox releases are well-known for *not* working in GAE environments because they use AWT classes not present there. – mkl May 26 '15 at 09:34