4

i have a pdf file(attached). My objective is to convert a pdf to an image using pdfbox AS IT IS,(same as using snipping tool in windows). The pdf has all kinds of shapes and text .

i am using the following code:

PDDocument doc = PDDocument.load("Hello World.pdf");
PDPage firstPage = (PDPage) doc.getDocumentCatalog().getAllPages().get(67);
BufferedImage bufferedImage = firstPage.convertToImage(imageType,screenResolution);
ImageIO.write(bufferedImage, "png",new File("out.png"));

This is the PDF i want to convert

when i use the code, the image file gives totally wrong outputs(out.png attached) This is the image file converted from pdfbox

how do i make pdfbox take something like a direct snapshot image?

also, i noticed that the image quality of the png is not so good, is there any way to increase the resolution of the generated image?

EDIT: here is the pdf(see page number 68) https://drive.google.com/file/d/0B0ZiP71EQHz2NVZUcElvbFNreEU/edit?usp=sharing

EDIT 2: it seems that all the text isvanishing. i also tried using the PDFImageWriter class

test.writeImage(doc, "png", null, 68, 69, "final.png",TYPE_USHORT_GRAY,200 );

same result

Gabor
  • 7,352
  • 4
  • 35
  • 56
harveyslash
  • 5,906
  • 12
  • 58
  • 111

3 Answers3

4

Using PDFRenderer it is possible to convert PDF page into image formats.

Convert PDF page into image in java Using PDF Renderer. Jars Required PDFRenderer-0.9.0

package com.pdfrenderer.examples;

import java.awt.Graphics2D;
import java.awt.Image;
import java.awt.Rectangle;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

import javax.imageio.ImageIO;

import com.sun.pdfview.PDFFile;
import com.sun.pdfview.PDFPage;

public class PdfToImage {
    public static void main(String[] args) {
        try {
            String sourceDir = "C:/Documents/Chemistry.pdf";// PDF file must be placed in DataGet folder
            String destinationDir = "C:/Documents/Converted/";//Converted PDF page saved in this folder

        File sourceFile = new File(sourceDir);
        File destinationFile = new File(destinationDir);

        String fileName = sourceFile.getName().replace(".pdf", "_cover");

        if (sourceFile.exists()) {
            if (!destinationFile.exists()) {
                destinationFile.mkdir();
                System.out.println("Folder created in: "+ destinationFile.getCanonicalPath());
            }

            RandomAccessFile raf = new RandomAccessFile(sourceFile, "r");
            FileChannel channel = raf.getChannel();
            ByteBuffer buf = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
            PDFFile pdf = new PDFFile(buf);
            int pageNumber = 62;// which PDF page to be convert
            PDFPage page = pdf.getPage(pageNumber);

            System.out.println("Total pages:"+ pdf.getNumPages());

            // create the image
            Rectangle rect = new Rectangle(0, 0, (int) page.getBBox().getWidth(), (int) page.getBBox().getHeight());
            BufferedImage bufferedImage = new BufferedImage(rect.width, rect.height, BufferedImage.TYPE_INT_RGB);

            // width & height, // clip rect, // null for the ImageObserver, // fill background with white, // block until drawing is done
            Image image = page.getImage(rect.width, rect.height, rect, null, true, true );
            Graphics2D bufImageGraphics = bufferedImage.createGraphics();
            bufImageGraphics.drawImage(image, 0, 0, null);

            File imageFile = new File( destinationDir + fileName +"_"+ pageNumber +".png" );// change file format here. Ex: .png, .jpg, .jpeg, .gif, .bmp

            ImageIO.write(bufferedImage, "png", imageFile);

            System.out.println(imageFile.getName() +" File created in: "+ destinationFile.getCanonicalPath());
        } else {
            System.err.println(sourceFile.getName() +" File not exists");
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

ConvertedImage:

Chemistry_cover_62

UdayKiran Pulipati
  • 6,579
  • 7
  • 67
  • 92
  • is `PDFRenderer` part of a project/have examples? – Don Cheadle Feb 18 '15 at 15:03
  • @mmcrae See https://java.net/projects/pdf-renderer . However I don't know if that project is still active. The last JIRA comment is from 2013. https://java.net/jira/browse/PDF_RENDERER/?selectedTab=com.atlassian.jira.jira-projects-plugin:summary-panel – Tilman Hausherr Feb 18 '15 at 15:10
  • @TilmanHausherr I would highly value your input on a similar question [PDF to image losing QR Code](http://stackoverflow.com/questions/28589477/pdfbox-pdf-to-image-losing-qr-code-colorspace-pattern-doesnt-provide-a-non-str) – Don Cheadle Feb 18 '15 at 17:07
  • Can you tell me how to do it in Android.Because Android does not supports java.awt package. – Akanksha Rathore May 27 '15 at 12:19
  • @Akanksha I didn't have knowledge on Android right now. – UdayKiran Pulipati Jul 23 '15 at 11:47
3

I get the same result as the OP using PDFBox version 1.8.4. In version 2.0.0-SNAPSHOT, though, it looks better:

enter image description here

Here only some arrows are thinner and some arrow parts are mis-drawn as boxes.

Thus,

how do i make pdfbox take something like a direct snapshot image?

The current release versions (up to 1.8.4) seem to have greater deficits when rendering PDFs as images. You may switch to a current development version (e.g. the current trunk, 2.0.0-SNAPSHOT) or wait until the improvements are released.

Furthermore, some minor deficits are even in 2.0.0-SNAPSHOT. You might want to present your sample document to the PDFBox people (i.e. create an according issue in their JIRA) so that they improve PDFBox even further to suit your needs.

also, i noticed that the image quality of the png is not so good, is there any way to increase the resolution of the generated image?

There are convertToImage overloads with resolution parameters. Your current code actually sets the resolution to screenResolution. Increase this resolution value.

PS: The code to render a PDF page to image has been refactored in 2.0.0-SNAPSHOT. Instead of

BufferedImage image =  page.convertToImage();

you now do

BufferedImage image =  RenderUtil.convertToImage(page);

I assume this has been done to remove direct AWT references from the core classes because AWT is not available on e.g. Android.


PS: The SNAPSHOT I used last year in this answer merely was a snapshot subject to changes. The 2.0.0 release is still under development, many things have changed. Especially there is no RenderUtil class anymore. Instead one currently has to use the PDFRenderer in the org.apache.pdfbox.rendering package...

mkl
  • 90,588
  • 15
  • 125
  • 265
  • but i need this for android :/ also, i didnt quite catch what changes i have to make to my code or project. – harveyslash Mar 12 '14 at 16:58
  • *i didnt quite catch what changes i have to make to my code or project* - You have to update the PDFBox and FontBox (and JempBox...) versions you use. Having done that you have to follow up on certain changes like how to call the `convertToImage` functionality. – mkl Mar 12 '14 at 17:01
  • i dont know anything about jempBox also, this is the latest jar file i got off the website(of pdfbox) edit:my basic goal is to convert a page from a pdf file to a jpeg. are there any better libraries which can accomplish this task without problems. (i need it to work in android too) – harveyslash Mar 12 '14 at 17:02
  • As the 2.0.0 version has not been released yet, you can merely retrieve a snapshot of the project and build your own jar files. – mkl Mar 12 '14 at 21:59
  • then suggest me another library that can convert pdf pages to jpeg – harveyslash Mar 13 '14 at 04:20
  • I cannot. I stumbled over the fact that pdfbox rendering has issues in the current release versions but improved considerably in the development snapshot. I don't really need pdf rendering myself and, therefore, have no suggestions for alternative rendering libs. – mkl Mar 13 '14 at 09:57
2

it turns out that jpedal(lgpl) does the converting perfectly(just like a snapshot).

here is what I have used :

PdfDecoder decode_pdf = new PdfDecoder(true);


FontMappings.setFontReplacements();

    decode_pdf.openPdfFile("Hello World.pdf"); 


 decode_pdf.setExtractionMode(0,800,3);

 try {

     for(int i=0;i<40;i++)
     {  
         BufferedImage img=decode_pdf.getPageAsImage(2+i);

    ImageIO.write(img, "png",new File(String.valueOf(i)+"out.png"));
     }
} catch (IOException ex) {
    Logger.getLogger(NewJFrame.class.getName()).log(Level.SEVERE, null, ex);
}

    decode_pdf.closePdfFile();

} catch (PdfException e) {
    e.printStackTrace();
}

it works fine.

harveyslash
  • 5,906
  • 12
  • 58
  • 111