0

I need to extract images from PDF. I know that some images are rotated 90 degrees (I checked with online tools).

I'm using this code:

PdfRenderListener:

public class PdfRenderListener : IExtRenderListener
{
    // other methods ...

    public void RenderImage(ImageRenderInfo renderInfo)
    {
        try
        {
            var mtx = renderInfo.GetImageCTM();
            var image = renderInfo.GetImage();
            var fillColor = renderInfo.GetCurrentFillColor();
            var color = Color.FromArgb(fillColor?.RGB ?? Color.Empty.ToArgb());
            var fileType = image.GetFileType();
            var extension = "." + fileType;
            var bytes = image.GetImageAsBytes();
            var height = mtx[Matrix.I22];
            var width = mtx[Matrix.I11];

            // rotated image
            if (height == 0 && width == 0)
            {
                var h = Math.Abs(mtx[Matrix.I12]);
                var w = Math.Abs(mtx[Matrix.I21]);
            }

            // save image
        }
        catch (Exception e)
        {
            Console.WriteLine(e);
        }
    }
}

When I save images with this code the rotated images are saved with distortion.

I have read this post iText 7 ImageRenderInfo Matrix contains negative height on Even number Pages and mkl answer.

In current transfromation matrix (mtx) I have these values:

0 841.9 0
-595.1 0 0
595.1 0 1

I know image rotated 90 degrees. How can I transform an image to get a normal image?

Azamat
  • 11
  • 3
  • Essentially that transformation matrix rotates the image by 90°, stretches the rotated image to 595.1×841.9 units, and moves the stretched image into the first quadrant. To get an analogous image, you have to apply the same steps using some bitmap image processing API. Probably, therefore, you should change the focus of your question to how to execute these bitmap image manipulation steps. – mkl Jan 24 '22 at 15:33
  • Hi, do you have a sample PDF to reproduce the behavior you are facing? Have you tried iText pdf2Data (https://pdf2data.online/)? As @mkl mentioned, you would need to just do the image rotation yourself if you want to use low level iText Core functionality, and you can find some recipes here: https://stackoverflow.com/questions/8639567/java-rotating-images – Alexey Subach Feb 06 '22 at 11:53

2 Answers2

1

As @mkl mentioned, the true reason was not in the rotation of the image, but with the applied filter.

I analyzed the pdf file with iText RUPS and found that the image was encoded with a CCITTFaxDecode filter: RUPS screen

Next, I looked for ways to decode this filter and found these questions

  1. Extracting image from PDF with /CCITTFaxDecode filter.
  2. How to use Bit Miracle LibTiff.Net to write the image to a MemoryStream

I used the BitMiracle.LibTiff.NET library

I wrote this method:

    private byte[] DecodeInternal(byte[] rawBytes, int width, int height, int k, int bitsPerComponent)
    {
        var compression = GetCompression(k);

        using var ms = new MemoryStream();
        var tms = new TiffStream();

        using var tiff = Tiff.ClientOpen("in-memory", "w", ms, tms);
        tiff.SetField(TiffTag.IMAGEWIDTH, width);
        tiff.SetField(TiffTag.IMAGELENGTH, height);
        tiff.SetField(TiffTag.COMPRESSION, compression);
        tiff.SetField(TiffTag.BITSPERSAMPLE, bitsPerComponent);
        tiff.SetField(TiffTag.SAMPLESPERPIXEL, 1);
        var writeResult = tiff.WriteRawStrip(0, rawBytes, rawBytes.Length);
        if (writeResult == -1)
        {
           Console.WriteLine("Decoding error");
        }

        tiff.CheckpointDirectory();
        var decodedBytes = ms.ToArray();
        tiff.Close();

        return decodedBytes;
    }

        private Compression GetCompression(int k)
        {
            return k switch
            {
                < 0 => Compression.CCITTFAX4,
                0 => Compression.CCITTFAX3,
                _ => throw new NotImplementedException("K > 0"),
            };
        }

After decoding and rotating the image, I was able to save a normal image. Thanks everyone for the help.

Azamat
  • 11
  • 3
0

You can try this. I'm using Itext 7 for java. Here you still need to write your own listener:

public class MyImageRenderListener implements IEventListener {

protected String path;

protected String extension;

public MyImageRenderListener (String path) {
    this.path = path;
}

public void eventOccurred(IEventData data, EventType type) {
    switch (type) {
        case RENDER_IMAGE:
            try {
                String filename;
                FileOutputStream os;
                ImageRenderInfo renderInfo = (ImageRenderInfo) data;
                PdfImageXObject image = renderInfo.getImage();
                if (image == null) {
                    return;
                }
                byte[] imageByte = image.getImageBytes(true);
                extension = image.identifyImageFileExtension();
                filename = String.format(path, image.getPdfObject().getIndirectReference().getObjNumber(), extension);
                os = new FileOutputStream(filename);
                os.write(imageByte);
                os.flush();
                os.close();
            } catch (com.itextpdf.io.exceptions.IOException | IOException e) {
                System.out.println(e.getMessage());
            }
            break;

        default:
            break;
    }
}

public Set<EventType> getSupportedEvents() {
    return null;
}
}

I checked for a pdf with a random rotation angle, and 90 degrees, the resulting picture was obtained without distortion

public void manipulatePdf() throws IOException, SQLException, ParserConfigurationException, SAXException {
    PdfDocument pdfDoc = new PdfDocument(new PdfReader("path to pdf"), new PdfWriter(new ByteArrayOutputStream()));
    MyImageRenderListener listener = new MyImageRenderListener("path to resulting image");

    PdfCanvasProcessor parser = new PdfCanvasProcessor(listener);
    for (int i = 1; i <= pdfDoc.getNumberOfPages(); i++) {
        parser.processPageContent(pdfDoc.getPage(i));
    }
    pdfDoc.close();
}