0

I would like to parse a PDF and find the logo via known attributes and when I find a match, remove that image and then copy everything else.

I am using the code below to replace an image with a blank white image to remove a logo from PDFs that are to be printed on letterhead. It replaces the image with a white image of the same size. Is there a way to modify this to actually remove the image (and thus save some space, etc.?).

    private static void Main(string[] args)
    {
        ManipulatePdf(@"C:\in.pdf", @"C:\out.pdf");

        Console.WriteLine("Finished - press a key");
        Console.ReadKey();
    }

    public static void ManipulatePdf(String src, String dest)
    {
        Console.WriteLine("Start");
        PdfReader reader = new PdfReader(src);

        // first read all references and find the one we wish to work on.
        PdfDictionary page = reader.GetPageN(1); // all resources are available to every page (?)
        PdfDictionary resources = page.GetAsDict(PdfName.RESOURCES);
        PdfDictionary xobjects = resources.GetAsDict(PdfName.XOBJECT);

        page = reader.GetPageN(1);
        resources = page.GetAsDict(PdfName.RESOURCES);
        xobjects = resources.GetAsDict(PdfName.XOBJECT);

        foreach (PdfName pdfName in xobjects.Keys)
        {
            PRStream stream = (PRStream) xobjects.GetAsStream(pdfName);

            if (stream.Length > 100000)
            {
                PdfImage image = new PdfImage(MakeBlankImg(), "", null);
                Console.WriteLine("Calling replace stream");
                ReplaceStream(stream, image);
            }
        }

        PdfStamper stamper = new PdfStamper(reader, new FileStream(dest, FileMode.Create));
        stamper.Close();
        reader.Close();
    }

    public static iTextSharp.text.Image MakeBlankImg()
    {
        Console.WriteLine("Making small blank image");
        byte[] array;

        using (MemoryStream ms = new MemoryStream())
        {
            //var drawingImage = image.GetDrawingImage();
            using (Bitmap newBi = new Bitmap(1, 1))
            {

                using (Graphics g = Graphics.FromImage(newBi))
                {
                    g.Clear(Color.White);
                    g.Flush();
                }
                newBi.Save(ms, ImageFormat.Jpeg);
            }

            array = ms.ToArray();
        }
        Console.WriteLine("Image array is " + array.Length + " bytes.");

        return iTextSharp.text.Image.GetInstance(array);
    }

    public static void ReplaceStream(PRStream orig, PdfStream stream)
    {
        orig.Clear();
        MemoryStream ms = new MemoryStream();
        stream.WriteContent(ms);
        orig.SetData(ms.ToArray(), false);

        Console.WriteLine("Iterating keys");

        foreach (KeyValuePair<PdfName, PdfObject> keyValuePair in stream)
        {
            Console.WriteLine("Key: " + keyValuePair.Key.ToString());

            orig.Put(keyValuePair.Key, stream.Get(keyValuePair.Key));
        }
    }
}
Robb Sadler
  • 705
  • 10
  • 22
  • In reading the PDF spec I saw "An indirect reference to an undefined object shall not be considered an error by a conforming reader; it shall be treated as a reference to the null object." (http://www.adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/pdf_reference_1-7.pdf). So it seems that I can simply eliminate the image object and the reader should not complain but simply skip references to that image. So this may simpler than I thought. – Robb Sadler Feb 27 '15 at 17:50
  • Yeah, you should be able use `KillIndirect`, you can see it used here: http://stackoverflow.com/a/8751517/231316 – Chris Haas Feb 27 '15 at 18:05
  • Ok -- tried that and acrobat reader and Acrobat both tell me "An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem." I also tried separately to just delete the object stream manually with a text editor and got the same error. However the outcome is good - the first image is gone and the second one I want to keep is there. All displays correctly. Can I safely ignore this as long as my printer doesn't care? – Robb Sadler Feb 27 '15 at 18:35
  • Using the iTextSharp solution suggested and integrating information from the post at: http://itext-general.2136553.n4.nabble.com/KillIndirect-functions-creates-erroneous-Pdf-td2529593.html, I was able to delete images and keep acrobat from complaining, but I am still sorting out how to tell which "Do" I am looking at and is it referencing the image I wish to delete. I will need to read the spec a bit more and do a little more experimentation. – Robb Sadler Feb 27 '15 at 21:53
  • The closest thing I can find that will get me where i need to go seems to be the RenderListener in iText(Sharp). I am attempting to use that to find my image and then possibly the code example referenced in http://stackoverflow.com/questions/26580912/pdf-convert-to-black-and-white-pngs/26756323#26756323 in @BrunoLowagie's answer to remove the image instead of replace it. Yeah, I bought the book, but hopefully this issue is solved before I have waited for it to be delivered. – Robb Sadler Mar 02 '15 at 17:48
  • @ChrisHaas, can you throw this dog a bone? I think the only thing missing with your suggestion is that the xref needs to be updated, but I am still definitely in the discovery stages for iTextSharp, so I could be wrong. – Robb Sadler Mar 02 '15 at 23:08
  • Sorry Robb, you've got a lot of code and text up there and I just haven't had a chance to dig through it. I'm swamped today but I'll see if I can look into it tonight. – Chris Haas Mar 03 '15 at 14:25
  • There - lots less code - only the code I am currently using as a workaround. Hopefully this makes it easier to read. – Robb Sadler Mar 04 '15 at 18:29
  • Ok, $40 later, I have a book http://smile.amazon.com/gp/product/1935182617/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1 that covers a lot of things regarding iText, but not KillIndirect (not even mentioned in the index) or how to use it on an individual image and fix up and remove the Do command that references it. Time to move on and just use the workaround. I still appreciate that I have a reference for a tool that is very useful. iTextSharp doesn't owe me anything... – Robb Sadler Mar 05 '15 at 17:50

0 Answers0