0

Question

What is the best way to extract images from a pdf document, represented as a byte[] stream, using iText?


Current Status

The "pdf" that I get is served as a byte[] from a server. I have no control on how I receive it.

I'm trying to figure out how to extract all images from that multipage pdf, and put each image into a BufferedImage[]. There is only one image on each pdf page, so if I had a pdf that was 10 pages long, I would have a BufferedImage[10]. My initial implementation read in files, and then converted. Requirements changed, and now I have to use streams for everything that I can.

My old implementation doesn't work anymore due to the fact that MyImageRenderListener does not offer a constructor without parameters (which isn't necessary with streams) and I can't get rid of the parser because parser.processContent() (which pulls the images out) takes a listener as a parameter.

I'm hoping there is a better solution than the direction that I'm currently coming at the problem from.


Code

convert()

public static byte[] convert(byte[] in ) throws FileNotFoundException, IOException {

    ByteArrayInputStream input = new ByteArrayInputStream(in);
    ArrayList<BufferedImage> bimgArrL = getBufImgArr(input);
    BufferedImage[] bim = new BufferedImage[bimgArrL.size()];
    bimgArrL.toArray(bim);
    // More code below, not important in this scenario

}

getBufImgArr()

    //used with streams
public static ArrayList<BufferedImage> getBufImgArr(final ByteArrayInputStream bais) throws IOException { // TODO: This needs to be a MemoryCacheRandomAccessInputStream

    PdfReader reader = new PdfReader(bais);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    MyImageRenderListener listener = new MyImageRenderListener(); // This is the problem

    for (int page = 1; page <= reader.getNumberOfPages(); page++) {
        parser.processContent(page, listener);
    }

    reader.close();
    return listener.getBimgArray();
}

MyImageRenderListener

public class MyImageRenderListener implements RenderListener {

    protected String path = "";
    protected ArrayList bimg = new ArrayList(); // Added this

    public MyImageRenderListener(String path) {
        this.path = path;
    }

    public ArrayList<BufferedImage> getBimgArray() { //
        return bimg;                                 // Added this
    }                                                //

    public void renderImage(ImageRenderInfo renderInfo) {
        try {

            PdfImageObject image = renderInfo.getImage();
            if (image == null) {
                return;
            }

            bimg.add(image.getBufferedImage()); // Added this

        } catch (IOException e) {
            System.out.println(e.getMessage());
        }
    }
}
Community
  • 1
  • 1
Scrambo
  • 579
  • 5
  • 17
  • 2
    That's really strange. How are you using `path` in `MyImageRenderListener`? Why do you need it there? Why can't you add a parameterless constructor to this class? – Alexey Subach Jul 07 '16 at 21:25
  • 1
    I agree with @AlexeySubach, I think you misunderstand. `RenderListener` is an interface, the constructor is an optional thing that you put in your class when you implement it. Many samples on the internet use the constructor to set a path but there's no contractual requirement as far as iText is concerned. – Chris Haas Jul 07 '16 at 23:54
  • @Scrambo Either add a new constructor without arguments (or simply remove the single existing constructor) or use a dummy argument. Java 101, isn't it? – mkl Jul 08 '16 at 11:40
  • You guys were right, obviously. I had this notion that since there was only one constructor that took a string, unknown actions might occur later on in the program if I made one without the parameter. After testing, that is most certainly not the case. Thanks! – Scrambo Jul 08 '16 at 14:40

0 Answers0