2

I am using iTextSharp and trying to extract images with transparency from a PDF. When I extract the image the transparency becomes solid black and is lost. I have found multiple examples of image extraction but all of them seem to have the same issue. The code that I am using is below

Another example is from itextpdf.com/examples/iia.php?id=284. This example includes images in the "results" section at the top. If you click Img7.png you will see the black border in the image, however at the bottom of the page there is a link to the original image info.png that shows the transparency the way it is supposed to look. This is the exact issue I am running into. Any help or ideas would be appreciated

public void ExtractImage(string pdfFile)
        {
            const int pageNumber = 1; //Page number to extract the image from
            PdfReader pdf = new PdfReader(pdfFile);
            PdfDictionary pg = pdf.GetPageN(pageNumber);
            PdfDictionary res = (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
            PdfDictionary xobj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
            foreach (PdfName name in xobj.Keys)
            {
                PdfObject obj = xobj.Get(name);
                if (obj.IsIndirect())
                {
                    PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
                    string width = tg.Get(PdfName.WIDTH).ToString();
                    string height = tg.Get(PdfName.HEIGHT).ToString();
                    ImageRenderInfo imgRI =
                            ImageRenderInfo.CreateForXObject(new Matrix(float.Parse(width), float.Parse(height)),
                                                             (PRIndirectReference)obj, tg);

                    var fileType= imgRI.GetImage().GetFileType();
                    RenderImage(imgRI, imgPath + +imgRI.GetRef().Number + "_" + imgRI.GetRef().Generation + "test." + fileType);
                }
            }
            pdf.Close();
        }

        private void RenderImage(ImageRenderInfo renderInfo, string saveImageLocation)
        {
            PdfImageObject image = renderInfo.GetImage();

            using (var dotnetImg = image.GetDrawingImage())
            {
                if (dotnetImg != null)
                {
                    dotnetImg.Save(saveImageLocation);
                }
            }
        }
user3140134
  • 23
  • 1
  • 3
  • 1
    a fast solution could be to buy or find something that doesn't require you to sleep with PDF spec under your pillow, otherwise you should follow Bruno's answer. – Hugo Moreno Dec 28 '13 at 17:33

1 Answers1

7

Please read the PDF specification (ISO-32000-1). You are making the assumption that, for instance a transparent PNG, can be stored inside a PDF as a transparent PNG. That assumption is wrong.

The image type PNG isn't supported in PDF. When a transparent PNG is added to a PDF document, it is converted into two compressed bitmaps. One bitmap is the image you're referring to: the image that allegedly lost its transparency. The other bitmap, an image you didn't tell us anything about, but that is there, is a mask for this image. When you examine the Image XObject, you'll notice that it has a reference to this mask. This is explained in my book in section 10.3.2, entitled "Masking images".

Your allegation that you have a transparent image stored in your PDF documents is wrong. Instead, you have two opaque images of which one image is the mask of the other, in order to achieve transparency. You can't extract these images as a single transparent image. You need to extract both opaque images and merge them into a single transparent image. This is outside the scope of iText(Sharp).

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • I have your book and have used it a lot to help with my development. My thought was since .png is supported which supports transparency I assumed that the transparency was supported as well. Thanks for your input and after merging the two images I am able to achieve the expected outcome. – user3140134 Dec 30 '13 at 14:29