Question
What is the best way to extract images from a pdf document, represented as a byte[] stream, using iText?
Current Status
The "pdf" that I get is served as a byte[] from a server. I have no control on how I receive it.
I'm trying to figure out how to extract all images from that multipage pdf, and put each image into a BufferedImage[]
. There is only one image on each pdf page, so if I had a pdf that was 10 pages long, I would have a BufferedImage[10]
. My initial implementation read in files, and then converted. Requirements changed, and now I have to use streams for everything that I can.
My old implementation doesn't work anymore due to the fact that MyImageRenderListener does not offer a constructor without parameters (which isn't necessary with streams) and I can't get rid of the parser because parser.processContent()
(which pulls the images out) takes a listener as a parameter.
I'm hoping there is a better solution than the direction that I'm currently coming at the problem from.
Code
convert()
public static byte[] convert(byte[] in ) throws FileNotFoundException, IOException {
ByteArrayInputStream input = new ByteArrayInputStream(in);
ArrayList<BufferedImage> bimgArrL = getBufImgArr(input);
BufferedImage[] bim = new BufferedImage[bimgArrL.size()];
bimgArrL.toArray(bim);
// More code below, not important in this scenario
}
getBufImgArr()
//used with streams
public static ArrayList<BufferedImage> getBufImgArr(final ByteArrayInputStream bais) throws IOException { // TODO: This needs to be a MemoryCacheRandomAccessInputStream
PdfReader reader = new PdfReader(bais);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener(); // This is the problem
for (int page = 1; page <= reader.getNumberOfPages(); page++) {
parser.processContent(page, listener);
}
reader.close();
return listener.getBimgArray();
}
MyImageRenderListener
public class MyImageRenderListener implements RenderListener {
protected String path = "";
protected ArrayList bimg = new ArrayList(); // Added this
public MyImageRenderListener(String path) {
this.path = path;
}
public ArrayList<BufferedImage> getBimgArray() { //
return bimg; // Added this
} //
public void renderImage(ImageRenderInfo renderInfo) {
try {
PdfImageObject image = renderInfo.getImage();
if (image == null) {
return;
}
bimg.add(image.getBufferedImage()); // Added this
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
}