2

I found some examples for how to extract images from PDF using iText. But what I am looking for is to get the images from PDF by coordinates.

Is it possible? If yes then how it can be done.

Community
  • 1
  • 1
rizzz86
  • 3,862
  • 8
  • 35
  • 52
  • 1
    The example you refer to extracts image resources present in the PDF. These images are not necessary exactly the images displayed because a) there may be more resources than are actually used and b) inline images are ignored by this approach. Thus, you should use the classes from the iText `parser` package. Look at the iText sample [ExtractImages](http://itextpdf.com/examples/iia.php?id=284). The `ImageRenderInfo` object contains coordinate information. – mkl Jun 16 '14 at 07:41
  • Thanks @mkl for pointing me to correct example. Can you please tell how can we use coordinates to get image from this example. – rizzz86 Jun 16 '14 at 08:00

1 Answers1

2

Along the lines of the iText example ExtractImages you can extract code like this:

PdfReader reader = new PdfReader(resourceStream);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
ImageRenderListener listener = new ImageRenderListener("testpdf");

for (int i = 1; i <= reader.getNumberOfPages(); i++) {
    parser.processContent(i, listener);
}

The ImageRenderListener is defined like this:

class ImageRenderListener implements RenderListener
{
    final String name;
    int counter = 100000;

    public ImageRenderListener(String name)
    {
        this.name = name;
    }

    public void beginTextBlock() { }
    public void renderText(TextRenderInfo renderInfo) { }
    public void endTextBlock() { }

    public void renderImage(ImageRenderInfo renderInfo)
    {
        try
        {
            PdfImageObject image = renderInfo.getImage();
            if (image == null) return;
            int number = renderInfo.getRef() != null ? renderInfo.getRef().getNumber() : counter++;
            String filename = String.format("%s-%s.%s", name, number, image.getFileType());
            FileOutputStream os = new FileOutputStream(filename);
            os.write(image.getImageAsBytes());
            os.flush();
            os.close();

            PdfDictionary imageDictionary = image.getDictionary();
            PRStream maskStream = (PRStream) imageDictionary.getAsStream(PdfName.SMASK);
            if (maskStream != null)
            {
                PdfImageObject maskImage = new PdfImageObject(maskStream);
                filename = String.format("%s-%s-mask.%s", name, number, maskImage.getFileType());
                os = new FileOutputStream(filename);
                os.write(maskImage.getImageAsBytes());
                os.flush();
                os.close();
            }
        }
        catch (IOException e)
        {
            e.printStackTrace();
        }
    }
}

As you see the ImageRenderListener method renderImage retrieves an argument ImageRenderInfo. This arguments has methods

  • getStartPoint giving you a vector in User space representing the start point of the xobject and
  • getImageCTM giving you the coordinate transformation matrix active when this image was rendered. Coordinates are in User space.

The latter gives you the information which exact manipulation on a 1x1 user space unit square are used to actually draw the image. As you are aware, an image may be rotated, stretched, skewed, and moved (the former method actually extracts its result from the matrix from the "moved" information).

mkl
  • 90,588
  • 15
  • 125
  • 265
  • the image is extracted successfully. Thanks. but the image is not proper somehow. It has some random colors on it. Please help to get a clear image as pdf shows. – Riddhi Shah May 01 '19 at 09:15
  • 1
    @Riddhi please make that a question in its own right and supply an example pdf for which the issue occurs. And please be aware that image extraction is about extracting the original source image, not about how the image eventually is displayed in a pdf viewer as the pdf might apply additional effects to the page in which the original image has been drawn. – mkl May 01 '19 at 10:18