0

I have an existing PDF from which I want to retrieve images

NOTE:

In the Documentation, this is the RESULT variable

public static final String RESULT = "results/part4/chapter15/Img%s.%s";

I am not getting why this image is needed?I just want to extract the images from my PDF file

So Now when I use MyImageRenderListener listener = new MyImageRenderListener(RESULT);

I am getting the error:

results\part4\chapter15\Img16.jpg (The system cannot find the path specified)

This is the code that I am having.

    package part4.chapter15;

    import java.io.IOException;


    import com.itextpdf.text.DocumentException;
    import com.itextpdf.text.pdf.PdfReader;
    import com.itextpdf.text.pdf.parser.PdfReaderContentParser;

    /**
     * Extracts images from a PDF file.
     */
    public class ExtractImages {

    /** The new document to which we've added a border rectangle. */
    public static final String RESOURCE = "resources/pdfs/samplefile.pdf";
    public static final String RESULT = "results/part4/chapter15/Img%s.%s";
    /**
     * Parses a PDF and extracts all the images.
     * @param src the source PDF
     * @param dest the resulting PDF
     */
    public void extractImages(String filename)
        throws IOException, DocumentException {
        PdfReader reader = new PdfReader(filename);
        PdfReaderContentParser parser = new PdfReaderContentParser(reader);
        MyImageRenderListener listener = new MyImageRenderListener(RESULT);
        for (int i = 1; i <= reader.getNumberOfPages(); i++) {
            parser.processContent(i, listener);
        }
        reader.close();
    }

    /**
     * Main method.
     * @param    args    no arguments needed
     * @throws DocumentException 
     * @throws IOException
     */
    public static void main(String[] args) throws IOException, DocumentException {
        new ExtractImages().extractImages(RESOURCE);
    }
}
Abhinav
  • 8,028
  • 12
  • 48
  • 89
  • possible duplicate of [The requested operation cannot be performed on a file with a user-mapped section open](http://stackoverflow.com/questions/4658354/the-requested-operation-cannot-be-performed-on-a-file-with-a-user-mapped-section) – Jongware Aug 12 '15 at 10:37
  • Hi..thats not the error anymore now, please see the issue,I have edited – Abhinav Aug 12 '15 at 10:45

1 Answers1

2

You have two questions and the answer to the first question is the key to the answer of the second.

Question 1:

You refer to:

public static final String RESULT = "results/part4/chapter15/Img%s.%s";

And you ask: why is this image needed?

That question is wrong, because Img%s.%s is not a filename of an image, it's a pattern of the filename of an image. While parsing, iText will detect images in the PDF. These images are stored in numbered objects (e.g. object 16) and these images can be exported in different formats (e.g. jpg, png,...).

Suppose that an image is stored in object 16 and that this image is a jpg, then the pattern will resolve to Img16.jpg.

Question 2:

Why do I get an error:

results\part4\chapter15\Img16.jpg (The system cannot find the path specified)

In your PDF, there's a jpg stored in object 16. You are asking iText to store that image using this path: results\part4\chapter15\Img16.jpg (as explained in my answer to Question 1). However: you working directory doesn't have the subdirectories results\part4\chapter15\, hence an IOException (or a FileNotFoundException?) is thrown.

What is the general problem?

You have copy/pasted the ExtractImages example I wrote for my book "iText in Action - Second Edition", but:

  1. You didn't read that book, so you have no idea what that code is supposed to do.
  2. You aren't telling the readers on StackOverflow that this example depends on the MyImageRenderer class, which is where all the magic happens.

How can you solve your problem?

Option 1:

Change RESULT like this:

public static final String RESULT = "Img%s.%s";

Now the images will be stored in your working directory.

Option 2:

Adapt the MyImageRenderer class, more specifically this method:

public void renderImage(ImageRenderInfo renderInfo) {
    try {
        String filename;
        FileOutputStream os;
        PdfImageObject image = renderInfo.getImage();
        if (image == null) return;
        filename = String.format(path,
            renderInfo.getRef().getNumber(), image.getFileType());
        os = new FileOutputStream(filename);
        os.write(image.getImageAsBytes());
        os.flush();
        os.close();
    } catch (IOException e) {
        System.out.println(e.getMessage());
    }
}

iText calls this class whenever an image is encountered. It passed an ImageRenderInfo to this method that contains plenty of information about that image.

In this implementation, we store the image bytes as a file. This is how we create the path to that file:

String.format(path,
     renderInfo.getRef().getNumber(), image.getFileType())

As you can see, the pattern stored in RESULT is used in such a way that the first occurrence of %s is replaced with a number and the second occurrence with a file extension.

You could easily adapt this method so that it stores the images as byte[] in a List if that is what you want.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • Hi Bruno..thank you so much, I was completely lost as I am more of a PHP Developer, I solved it – Abhinav Aug 13 '15 at 05:36
  • Please download the free ebook [The Best iText Questions on StackOverflow](http://pages.itextpdf.com/ebook-stackoverflow-questions.html) so that you have the most common iText questions in one place. For instance [How to get the co-ordinates of an image?](http://stackoverflow.com/questions/24055187/get-co-ordinates-of-image-in-pdf) tells you how to find the width and height of the image on the page. You'll need `java.awt.BufferedImage` to find out the size in pixels. – Bruno Lowagie Aug 13 '15 at 07:15
  • Hi Bruno, this is what I did using the matrix results, `xPosition = matrix.get(Matrix.I31); yPosition = matrix.get(Matrix.I32);width = matrix.get(Matrix.I11);height = matrix.get(Matrix.I22); ` So I just have to divide width/xPosition for x-DPI and height/yPosition for y-DPI? – Abhinav Aug 13 '15 at 13:37
  • No, please use common sense, and you'll understand that `xPosition` and `yPosition` have nothing to do whatsoever with the resolution. Just think about it: does moving an image around change the resolution? Of course not! Just follow my advice and use a `BufferedImage` to get the number of pixels. – Bruno Lowagie Aug 13 '15 at 13:40
  • I used BufferedImage like this `PdfImageObject image = renderInfo.getImage(); BufferedImage bi = image.getBufferedImage();` But I am getting this error: Unsupported Image Type. How can I pass the image to `getBufferedImage` function? – Abhinav Aug 13 '15 at 14:18
  • But it's a JPG! You have its bytes! You may even have the image as a file (`img16.jpg`). That's explained [here](http://stackoverflow.com/a/15895924/1622493) and [here](http://stackoverflow.com/a/10392050/1622493) and on many other places. Getting the number of pixels is explained [here](http://stackoverflow.com/a/1604335/1622493) and [here](http://stackoverflow.com/q/6524196/1622493) and many other places. Your question has been answered many times before. – Bruno Lowagie Aug 13 '15 at 14:24