Extract images with itextsharp from a pdf in the correct order

Question

I need to extract one image for page of a pdf,i have a code extracted from another question in stackoverflow to extract images from a pdf, and some times Works all perfect, but other times not extract the images in the order that i hope(first page it will be the first image) and the first image correspond to the first page of the pdf file, this is the code:

 private static int WriteImageFile(string pdf, string path)
        {
            int nfotos = 0;
            try
            {
                // Get a List of Image
                List<System.Drawing.Image> ListImage = ExtractImages(pdf);
                nfotos = ListImage.Count;
                for (int i = 0; i < ListImage.Count; i++)
                {
                    try
                    {
                        ListImage[i].Save(path+ "\\Image" + i + ".bmp", System.Drawing.Imaging.ImageFormat.Bmp);
                                          }
                    catch (Exception e)
                    { MessageBox.Show(e.Message); }
                }

            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.Message);
            }
            return nfotos;
        }


 private static List<System.Drawing.Image> ExtractImages(String PDFSourcePath)
        {
            List<System.Drawing.Image> ImgList = new List<System.Drawing.Image>();

            iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
            iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
            iTextSharp.text.pdf.PdfObject PDFObj = null;
            iTextSharp.text.pdf.PdfStream PDFStremObj = null;

            try
            {
                RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
                PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);
                Form1 formulario = new Form1();
                for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
                {
                    PDFObj = PDFReaderObj.GetPdfObject(i);

                    if ((PDFObj != null) && PDFObj.IsStream())
                    {
                        PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
                        iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

                        if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
                        {
                            try
                            {

                                iTextSharp.text.pdf.parser.PdfImageObject PdfImageObj =
                         new iTextSharp.text.pdf.parser.PdfImageObject((iTextSharp.text.pdf.PRStream)PDFStremObj);

                                System.Drawing.Image ImgPDF = PdfImageObj.GetDrawingImage();


                                ImgList.Add(ImgPDF);
                            }
                            catch (Exception)
                            {

                            }
                        }
                    }
                }
                PDFReaderObj.Close();
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }
            return ImgList;
        }

So am i doing something wrong? or is any way to know what page is processing to associate a image with a pdf page?

You are doing it wrong. You are looping over the objects inside a PDF using brute force. You are not taking into account that an image can be present on more than one page. You aren't examining the content stream of each page to find out the coordinates of each image. Read [ExtractImages](http://developers.itextpdf.com/examples/itext-action-second-edition/chapter-15#562-extractimages.java) and [How to get the co-ordinates of an image?](http://developers.itextpdf.com/question/how-get-co-ordinates-image) — Bruno Lowagie, Dec 14 '15 at 13:15
Thanks for the links!!, i´m looking now, but there is 1 thing that i don´t understand, when he is doing: MyImageRenderListener listener = new MyImageRenderListener(RESULT); RESULT is a String and the function public void renderImage(ImageRenderInfo renderInfo) expect a renderinfo, can you please explain to me? — Ion, Dec 14 '15 at 13:33
The constructor to `MyImageRenderListener` that you are referencing is the path and filename format (`String.Format()`) to save the image to. The `renderImage()` method uses the supplied string to write the image bytes to disk. — Chris Haas, Dec 14 '15 at 14:46
@Chris Haas is correct. Ideally you should read the book for which the examples were written, but since you don't have the book, you'll need to experiment with the examples. You'll need to study [MyImageRenderListener](http://developers.itextpdf.com/examples/itext-action-second-edition/chapter-15#570-myimagerenderlistener.java). — Bruno Lowagie, Dec 14 '15 at 15:33
I just found this question of which the answer should also make a couple of things clear: http://stackoverflow.com/questions/31962472 — Bruno Lowagie, Dec 14 '15 at 16:15

Extract images with itextsharp from a pdf in the correct order

0 Answers0