Converting PDF to image (with proper formatting)

Question

i have a pdf file(attached). My objective is to convert a pdf to an image using pdfbox AS IT IS,(same as using snipping tool in windows). The pdf has all kinds of shapes and text .

i am using the following code:

PDDocument doc = PDDocument.load("Hello World.pdf");
PDPage firstPage = (PDPage) doc.getDocumentCatalog().getAllPages().get(67);
BufferedImage bufferedImage = firstPage.convertToImage(imageType,screenResolution);
ImageIO.write(bufferedImage, "png",new File("out.png"));

This is the PDF i want to convert

when i use the code, the image file gives totally wrong outputs(out.png attached) This is the image file converted from pdfbox

how do i make pdfbox take something like a direct snapshot image?

also, i noticed that the image quality of the png is not so good, is there any way to increase the resolution of the generated image?

EDIT: here is the pdf(see page number 68) https://drive.google.com/file/d/0B0ZiP71EQHz2NVZUcElvbFNreEU/edit?usp=sharing

EDIT 2: it seems that all the text isvanishing. i also tried using the PDFImageWriter class

test.writeImage(doc, "png", null, 68, 69, "final.png",TYPE_USHORT_GRAY,200 );

same result

You have provided two images, no pdf. (Most probably the pdf has automatically be converted upon upload.) To actually provide the pdf you have to share it somewhere else (e.g a publicly shared file on Dropbox) and post the URL here. — mkl, Mar 11 '14 at 19:15

score 4 · Answer 1 · answered Mar 13 '14 at 11:23

Using PDFRenderer it is possible to convert PDF page into image formats.

Convert PDF page into image in java Using PDF Renderer. Jars Required PDFRenderer-0.9.0

package com.pdfrenderer.examples;

import java.awt.Graphics2D;
import java.awt.Image;
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

import javax.imageio.ImageIO;

import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;

public class PdfToImage {
    public static void main(String[] args) {
        try {
            String sourceDir = "C:/Documents/Chemistry.pdf";// PDF file must be placed in DataGet folder
            String destinationDir = "C:/Documents/Converted/";//Converted PDF page saved in this folder

        File sourceFile = new File(sourceDir);
        File destinationFile = new File(destinationDir);

        String fileName = sourceFile.getName().replace(".pdf", "_cover");

        if (sourceFile.exists()) {
            if (!destinationFile.exists()) {
                destinationFile.mkdir();
                System.out.println("Folder created in: "+ destinationFile.getCanonicalPath());
            }

            RandomAccessFile raf = new RandomAccessFile(sourceFile, "r");
            FileChannel channel = raf.getChannel();
            ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
            PDFFile pdf = new PDFFile(buf);
            int pageNumber = 62;// which PDF page to be convert
            PDFPage page = pdf.getPage(pageNumber);

            System.out.println("Total pages:"+ pdf.getNumPages());

            // create the image
            Rectangle rect = new Rectangle(0, 0, (int) page.getBBox().getWidth(), (int) page.getBBox().getHeight());
            BufferedImage bufferedImage = new BufferedImage(rect.width, rect.height, BufferedImage.TYPE_INT_RGB);

            // width & height, // clip rect, // null for the ImageObserver, // fill background with white, // block until drawing is done
            Image image = page.getImage(rect.width, rect.height, rect, null, true, true );
            Graphics2D bufImageGraphics = bufferedImage.createGraphics();
            bufImageGraphics.drawImage(image, 0, 0, null);

            File imageFile = new File( destinationDir + fileName +"_"+ pageNumber +".png" );// change file format here. Ex: .png, .jpg, .jpeg, .gif, .bmp

            ImageIO.write(bufferedImage, "png", imageFile);

            System.out.println(imageFile.getName() +" File created in: "+ destinationFile.getCanonicalPath());
        } else {
            System.err.println(sourceFile.getName() +" File not exists");
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

ConvertedImage:

Chemistry_cover_62

@mmcrae See https://java.net/projects/pdf-renderer . However I don't know if that project is still active. The last JIRA comment is from 2013. https://java.net/jira/browse/PDF_RENDERER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel — Tilman Hausherr, Feb 18 '15 at 15:10
@TilmanHausherr I would highly value your input on a similar question [PDF to image losing QR Code](http://stackoverflow.com/questions/28589477/pdfbox-pdf-to-image-losing-qr-code-colorspace-pattern-doesnt-provide-a-non-str) — Don Cheadle, Feb 18 '15 at 17:07
Can you tell me how to do it in Android.Because Android does not supports java.awt package. — Akanksha Rathore, May 27 '15 at 12:19

mkl · Answer 2 · 2015-08-21T16:09:08.677

3

I get the same result as the OP using PDFBox version 1.8.4. In version 2.0.0-SNAPSHOT, though, it looks better:

enter image description here

Here only some arrows are thinner and some arrow parts are mis-drawn as boxes.

Thus,

how do i make pdfbox take something like a direct snapshot image?

The current release versions (up to 1.8.4) seem to have greater deficits when rendering PDFs as images. You may switch to a current development version (e.g. the current trunk, 2.0.0-SNAPSHOT) or wait until the improvements are released.

Furthermore, some minor deficits are even in 2.0.0-SNAPSHOT. You might want to present your sample document to the PDFBox people (i.e. create an according issue in their JIRA) so that they improve PDFBox even further to suit your needs.

also, i noticed that the image quality of the png is not so good, is there any way to increase the resolution of the generated image?

There are convertToImage overloads with resolution parameters. Your current code actually sets the resolution to screenResolution. Increase this resolution value.

PS: The code to render a PDF page to image has been refactored in 2.0.0-SNAPSHOT. Instead of

BufferedImage image =  page.convertToImage();

you now do

BufferedImage image =  RenderUtil.convertToImage(page);

I assume this has been done to remove direct AWT references from the core classes because AWT is not available on e.g. Android.

PS: The SNAPSHOT I used last year in this answer merely was a snapshot subject to changes. The 2.0.0 release is still under development, many things have changed. Especially there is no RenderUtil class anymore. Instead one currently has to use the PDFRenderer in the org.apache.pdfbox.rendering package...

edited Aug 21 '15 at 16:09

answered Mar 12 '14 at 16:43

mkl

90,588
15
125
265

but i need this for android :/ also, i didnt quite catch what changes i have to make to my code or project. – harveyslash Mar 12 '14 at 16:58
*i didnt quite catch what changes i have to make to my code or project* - You have to update the PDFBox and FontBox (and JempBox...) versions you use. Having done that you have to follow up on certain changes like how to call the `convertToImage` functionality. – mkl Mar 12 '14 at 17:01
i dont know anything about jempBox also, this is the latest jar file i got off the website(of pdfbox) edit:my basic goal is to convert a page from a pdf file to a jpeg. are there any better libraries which can accomplish this task without problems. (i need it to work in android too) – harveyslash Mar 12 '14 at 17:02
As the 2.0.0 version has not been released yet, you can merely retrieve a snapshot of the project and build your own jar files. – mkl Mar 12 '14 at 21:59
then suggest me another library that can convert pdf pages to jpeg – harveyslash Mar 13 '14 at 04:20
I cannot. I stumbled over the fact that pdfbox rendering has issues in the current release versions but improved considerably in the development snapshot. I don't really need pdf rendering myself and, therefore, have no suggestions for alternative rendering libs. – mkl Mar 13 '14 at 09:57

score 2 · Accepted Answer · answered Mar 14 '14 at 06:41

2

it turns out that jpedal(lgpl) does the converting perfectly(just like a snapshot).

here is what I have used :

PdfDecoder decode_pdf = new PdfDecoder(true);


FontMappings.setFontReplacements();

    decode_pdf.openPdfFile("Hello World.pdf"); 


 decode_pdf.setExtractionMode(0,800,3);

 try {

     for(int i=0;i<40;i++)
     {  
         BufferedImage img=decode_pdf.getPageAsImage(2+i);

    ImageIO.write(img, "png",new File(String.valueOf(i)+"out.png"));
     }
} catch (IOException ex) {
    Logger.getLogger(NewJFrame.class.getName()).log(Level.SEVERE, null, ex);
}

    decode_pdf.closePdfFile();

} catch (PdfException e) {
    e.printStackTrace();
}

it works fine.

answered Mar 14 '14 at 06:41

harveyslash

5,906
12
58
111

1

Maybe out of scope... but how'd you include the JPedal JAR's in your project? Did you use a Maven repo? I cannot find one suitable. – Don Cheadle Feb 18 '15 at 00:37
1

are you using it for free trial or full priced? – Don Cheadle Feb 18 '15 at 00:56
@mmcrae - Is this the relevant jar - http://www.java2s.com/Code/Jar/j/Downloadjpedallgpljar.htm? – Jaydev May 23 '16 at 18:14
How you managed to get it working on android? BufferedImage is not available in Android SDK! – user846316 Sep 15 '16 at 14:01

Converting PDF to image (with proper formatting)

3 Answers3

Linked