5

I want to convert a PDF page to image file. Text is missing when I convert a PDF page to image using java.

The file which I want to convert 46_2.pdf after converting it shown me like 46_2.png

Code:

import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

public class ConvertPDFPageToImageWithoutText {
    public static void main(String[] args) {
        try {
            String oldPath = "C:/PDFCopy/46_2.pdf";
            File oldFile = new File(oldPath);
           if (oldFile.exists()) {

            PDDocument document = PDDocument.load(oldPath);
            List<PDPage> list = document.getDocumentCatalog().getAllPages();

            for (PDPage page : list) {
                BufferedImage image = page.convertToImage();
                File outputfile = new File("C:/PDFCopy/image.png");
                ImageIO.write(image, "png", outputfile);
                document.close();
            }

        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}
}
UdayKiran Pulipati
  • 6,579
  • 7
  • 67
  • 92
  • I'd try using the convertToImage( type, resolution ) method and see what you get. I bet you're going to have to tinker with the resolution a few times to get it right. http://pdfbox.apache.org/docs/1.8.3/javadocs/org/apache/pdfbox/pdmodel/PDPage.html#convertToImage(int, int) – Robert Beltran Jan 11 '14 at 06:44
  • The 1.8.x versions have deficiencies with font rendering. These have been solved in the unreleased 2.0 version, which you can get with svn from the repository, and the build with maven. – Tilman Hausherr Nov 08 '14 at 23:50
  • @TilmanHausherr can you give me a link for jar download. – UdayKiran Pulipati Nov 11 '14 at 09:50
  • https://pdfbox.apache.org/downloads.html#scm Note that the API is different (especially rendering), so look at the examples to see how it is done. – Tilman Hausherr Nov 11 '14 at 09:51

3 Answers3

2

Since you're using PDFBox, try using PDFImageWriter.writeToImage instead of PDPage.convertToImage. This post seems relevant to what you are trying to do.

Community
  • 1
  • 1
chairbender
  • 839
  • 6
  • 14
  • 1
    Ah, too bad. According to [this link](http://mail-archives.apache.org/mod_mbox/pdfbox-users/201307.mbox/%3Cdef98071-3cb6-4fa2-9dd4-1ea2efcaa0ee@email.android.com%3E) ...it seems there are known issues with certain fonts. – chairbender Jan 11 '14 at 06:54
  • [PDFImageWriter.writeToImage](http://pdfbox.apache.org/docs/1.8.3/javadocs/org/apache/pdfbox/util/PDFImageWriter.html#writeImage%28org.apache.pdfbox.pdmodel.PDDocument,%20java.lang.String,%20java.lang.String,%20int,%20int,%20java.lang.String%29) gives me same output. – UdayKiran Pulipati Jan 11 '14 at 07:37
  • I understand. I'm telling you that PDFBox [apparently has issues with some fonts](http://mail-archives.apache.org/mod_mbox/pdfbox-users/201307.mbox/%3Cdef98071-3cb6-4fa2-9dd4-1ea2efcaa0ee@email.android.com%3E), so I don't think you'll be able to get PDFBox to successfully preserve that text until the developers fix pdfbox. – chairbender Jan 11 '14 at 07:45
  • 1
    `PDFImageWriter.writeImage()` uses `PDPage.convertToImage()` internally and just saves resulted BufferedImage into file system. – Nikita Bosik Dec 08 '15 at 13:45
1

I had the same problem. I found an article(unfortunally can't remember where because I've read hundred of them). There an author complained that appeared such problems in PDFBox after they updated the Java version to 7.21. So I'm using 7.17 and it works for me:)

0

Use the latest version of PDFBox(I am using 2.0.9) and add JAI Image I/O dependency from here. This is sample running code on JAVA 7.

    public void pdfToImageConvertorUsingPdfBox(String inputPdfPath) throws Exception {
    File sourceFile = new File(inputPdfPath);
    String formatName = "png";
    if (sourceFile.exists()) {
        PDDocument document = PDDocument.load(sourceFile);
        PDFRenderer pdfRenderer = new PDFRenderer(document);
        int count = document.getNumberOfPages();

        for (int i = 0; i < count; i++) {
            BufferedImage image = pdfRenderer.renderImageWithDPI(i, 200, ImageType.RGB);
            String output = FilenameUtils.removeExtension(inputPdfPath) + "_" + (i + 1) + "." + formatName;
            ImageIO.write(image, formatName, new File(output));
        }
        document.close();
    } else {
        logger.error(sourceFile.getName() + " File not exists");
    }
}
rj27
  • 89
  • 1
  • 9