11

Using pdfbox, is it possible to convert a PDF (or a PDF byte[]) into an image byte[]? I've looked through several examples online and the only ones I can find describe how either to directly write the converted file to the filesystem or to convert it to a Java AWT object.

I'd rather not incur the IO of writing an image file to the filesystem, read into a byte[], and then delete it.

So this I can do:

String destinationImageFormat = "jpg";
boolean success = false;
InputStream is = getClass().getClassLoader().getResourceAsStream("example.pdf");
PDDocument pdf = PDDocument.load( is, true );

int resolution = 256;
String password = "";
String outputPrefix = "myImageFile";

PDFImageWriter imageWriter = new PDFImageWriter();    

success = imageWriter.writeImage(pdf, 
                    destinationImageFormat, 
                    password, 
                    1, 
                    2, 
                    outputPrefix, 
                    BufferedImage.TYPE_INT_RGB, 
                    resolution);

As well as this:

InputStream is = getClass().getClassLoader().getResourceAsStream("example.pdf");

PDDocument pdf = PDDocument.load( is, true );
List<PDPage> pages = pdf.getDocumentCatalog().getAllPages();

for ( PDPage page : pages )
{
    BufferedImage image = page.convertToImage();
}

Where I'm not clear on is how to tranform the BufferedImage into a byte[]. I know this is transformed into a file output stream in imageWriter.writeImage(), but I'm not clear on how the API works.

user2100746
  • 121
  • 2
  • 2
  • 4

3 Answers3

11

You can use ImageIO.write to write to an OutputStream. To get a byte[], use a ByteArrayOutputStream, then call toByteArray() on it.

  • 1
    Thanks. This works as intended. If I had enough reputation, I'd vote you up, but this is my first post to StackOverflow. – user2100746 Feb 22 '13 at 22:08
  • anyone can mark "answer as accepted" for their respective questions, it's the fundamental premise SO. Failure to do so will ensure answerers withhold knowledge from you. #payitforward – angryITguy Dec 01 '13 at 23:12
  • how does this leverage PDfBox? (or does it not?) Can you give a snippet of code for example – Don Cheadle Feb 18 '15 at 00:21
  • @mmcrae the code in the question already leverages pdfbox, this is just the missing part – aditsu quit because SE is EVIL Feb 18 '15 at 05:36
  • @aditsu I've been having issues with ImageIO.write not including images from the source PDF [PDFBox to Image losing QR Code "ColorSpace Pattern doesn't provide a non-stroking color"](http://stackoverflow.com/questions/28589477/pdfbox-pdf-to-image-losing-qr-code-colorspace-pattern-doesnt-provide-a-non-str) – Don Cheadle Feb 18 '15 at 17:11
  • @mmcrae that looks like a problem with pdfbox. Nothing to do with this question. I use mupdf for best results. – aditsu quit because SE is EVIL Feb 18 '15 at 18:16
1

Add maven dependency:

    <!-- https://mvnrepository.com/artifact/org.apache.pdfbox/pdfbox -->
    <dependency>
        <groupId>org.apache.pdfbox</groupId>
        <artifactId>pdfbox</artifactId>
        <version>2.0.1</version>
    </dependency>

And, conver a pdf to image:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
import javax.imageio.ImageIO;

private List<String> savePDF(String filePath) throws IOException {
    List<String> result = Lists.newArrayList();

    File file = new File(filePath);

    PDDocument doc = PDDocument.load(file);
    PDFRenderer renderer = new PDFRenderer(doc);

    int pageSize = doc.getNumberOfPages();
    for (int i = 0; i < pageSize; i++) {
        String pngFileName = file.getPath() + "." + (i + 1) + ".png";

        FileOutputStream out = new FileOutputStream(pngFileName);
        ImageIO.write(renderer.renderImageWithDPI(i, 96), "png", out);
        out.close();

        result.add(pngFileName);
    }
    doc.close();
    return result;
}

EDIT:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
import javax.imageio.ImageIO;

private List<String> savePDF(String filePath) throws IOException {
    List<String> result = Lists.newArrayList();

    File file = new File(filePath);

    PDDocument doc = PDDocument.load(file);
    PDFRenderer renderer = new PDFRenderer(doc);

    int pageSize = doc.getNumberOfPages();
    for (int i = 0; i < pageSize; i++) {
        String pngFileName = file.getPath() + "." + (i + 1) + ".png";

        ByteArrayOutputStream out = new ByteArrayOutputStream(pngFileName);
        ImageIO.write(renderer.renderImageWithDPI(i, 96), "png", out);

        out.toByteArray(); // here you can get a byte array

        out.close();

        result.add(pngFileName);
    }
    doc.close();
    return result;
}
BeeNoisy
  • 1,254
  • 1
  • 14
  • 23
  • The OP asked for a way to have pdfbox render a pdf directly to a `byte []`, not a file. Your answer on the other hand only shows another way to have it render to a file. – mkl Dec 27 '16 at 07:11
  • Replace FileOutputStream to ByteArrayOutputStream – BeeNoisy Dec 27 '16 at 09:12
  • `"ByteArrayOutputStream out = new ByteArrayOutputStream(pngFileName)"` - `ByteArrayOutputStream` only has two constructors, one without parameters and one with an `int` parameter. Thus, your call using a `String` parameter will not even compile unless you mean a different `ByteArrayOutputStream` than the one in `java.io`. – mkl Dec 29 '16 at 20:53
0
 try {           
                PDDocument document = PDDocument.load(PdfInfo.getPDFWAY());
                if (document.isEncrypted()) {
                    document.decrypt(PdfInfo.getPASSWORD());
                }
                if ("bilevel".equalsIgnoreCase(PdfInfo.getCOLOR())) {
                    PdfInfo.setIMAGETYPE( BufferedImage.TYPE_BYTE_BINARY);
                } else if ("indexed".equalsIgnoreCase(PdfInfo.getCOLOR())) {
                    PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_INDEXED);
                } else if ("gray".equalsIgnoreCase(PdfInfo.getCOLOR())) {
                    PdfInfo.setIMAGETYPE(BufferedImage.TYPE_BYTE_GRAY);
                } else if ("rgb".equalsIgnoreCase(PdfInfo.getCOLOR())) {
                    PdfInfo.setIMAGETYPE(BufferedImage.TYPE_INT_RGB);
                } else if ("rgba".equalsIgnoreCase(PdfInfo.getCOLOR())) {
                    PdfInfo.setIMAGETYPE(BufferedImage.TYPE_INT_ARGB);
                } else {
                    System.exit(2);
                }
                PDFImageWriter imageWriter = new PDFImageWriter();
                boolean success = imageWriter.writeImage(document, PdfInfo.getIMAGE_FORMAT(),PdfInfo.getPASSWORD(),
                        PdfInfo.getSTART_PAGE(),PdfInfo.getEND_PAGE(),PdfInfo.getOUTPUT_PREFIX(),PdfInfo.getIMAGETYPE(),PdfInfo.getRESOLUTION());
                if (!success) {
                    System.exit(1);
                }
                document.close();

        } catch (IOException | CryptographyException | InvalidPasswordException ex) {
            Logger.getLogger(PdfToImae.class.getName()).log(Level.SEVERE, null, ex);
        }
public class PdfInfo {
    private static String PDFWAY;    
    private static String OUTPUT_PREFIX;
    private static String PASSWORD;
    private static int START_PAGE=1;
    private static int END_PAGE=Integer.MAX_VALUE;
    private static String IMAGE_FORMAT="jpg";
    private static String COLOR="rgb";
    private static int RESOLUTION=256;
    private static int IMAGETYPE=24;
    private static String filename;
    private static String filePath="";
}
Vahap Gencdal
  • 1,900
  • 18
  • 17