Extract Images from Pdf via iTextSharp 4.1.6.0

Question

Hello all(and you Bruno too :) ).
I'm using iTextSharp 4.1.6.0 that ported for Xamarin.Android.
For some reason i need to extract images from pdf.
I founded too much examples,but seems they are not acceptable for my case,because some classes(like :
ImageCodeInfo , ImageRenderInfo , System.Drawing.Imaging.EncoderParameters , PdfImageObject and etc,doesn't exist).

But one example looks fine,here is it:

void ExtractJpeg(string file)
{
    var dir1 = Path.GetDirectoryName(file);
    var fn = Path.GetFileNameWithoutExtension(file);
    var dir2 = Path.Combine(dir1, fn);
    if (!Directory.Exists(dir2)) Directory.CreateDirectory(dir2);

    var pdf = new PdfReader(file);
    int n = pdf.NumberOfPages;
    for (int i = 1; i <= n; i++)
    {
        var pg = pdf.GetPageN(i);
        var res = PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES)) as PdfDictionary;
        var xobj = PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT)) as PdfDictionary;
        if (xobj == null) continue;

        var keys = xobj.Keys;
        if (keys.Count == 0) continue;

        var obj = xobj.Get(keys.ElementAt(0));
        if (!obj.IsIndirect()) continue;

        var tg = PdfReader.GetPdfObject(obj) as PdfDictionary;
        var type = PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE)) as PdfName;
        if (!PdfName.IMAGE.Equals(type)) continue;

        int XrefIndex = (obj as PRIndirectReference).Number;
        var pdfStream = pdf.GetPdfObject(XrefIndex) as PRStream;
        var data = PdfReader.GetStreamBytesRaw(pdfStream);
        var jpeg = Path.Combine(dir2, string.Format("{0:0000}.jpg", i));
        File.WriteAllBytes(jpeg, data);
    }
}

And problem in this line :

var obj = xobj.Get(keys.ElementAt(0));

Error log:

The type arguments for method `System.Linq.ParallelEnumerable.ElementAt(this System.Linq.ParallelQuery, int)' cannot be inferred from the usage. Try specifying the type arguments explicitly

I have no idea how to make workaround. Can some explain me ?

Also,i would like to know if exist another method to extract image from pdf.
Thanks!!

score 3 · Accepted Answer · edited May 23 '17 at 11:45

First, the obligatory speech about upgrading from old, obsolete and no longer officially supported software:

Please upgrade to the most recent version of iTextSharp. I know that you're going to say that you can't use iText's new license but please read their sales FAQ, specifically the "Why shouldn't I use..." section which addresses 4.1.6. Please remember that in most countries, accepting the license actually enters you into a legal contract so I would also have someone with legal experience read that, too. Since you say that you are using Xamarin I'm thinking that you are submitting this to a store, too, so this is even more important because the problems can multiply very fast.

Also, there's a new version of PDF coming out pretty soon and you'll probably want to be on track to support that, too.

Second, your code makes a giant and incorrect assumption that all images in a PDF are JPEGs. See this post and this post for a bit of a discussion on it. Maybe your PDFs are all JPEGs so this works for you but there's a good chance that this will break "tomorrow".

Third, I can't get ElementAt to work with an ICollection. I don't know if I'm missing an extension or a using somewhere but it appears that you copied the code from a five year old post here that came from a six year old post here. I'm also not sure why the "first" element is needed anyway, that's weird. The solution is to just loop over the keys instead of trying to just explicitly grab one. Instead of:

var obj = xobj.Get(keys.ElementAt(0));
//...
File.WriteAllBytes(jpeg, data);

Loop over each key:

foreach (PdfName k in keys) {
    var obj = xobj.Get(k);
    //...
    File.WriteAllBytes(jpeg, data);
}

This small change will make us all cry but it should make extraction of images at least work.

Thanks for great answer, mate! At first,i understand that i need to update my iTextSharp version,but the main problem is that i don't know how to integrate into my platform. At second part, i want to say,that yes,i know(directly),that images in pdf are in jpg,so that is my case.And at last,yes :))),i got this snippet from 5 year old comment and you're right about foreach loop. Thanks again!!!!!! — XTL, Nov 13 '15 at 08:02
As far as I understand, Xamarin is some sort of Mono for Android so you can use .NET stuff on Android? You'll be happy to learn that iText runs natively on Android as Java. To be more precise, there is iTextG, which is a stripped down version of iText, where we removed AWT and other stuff that does not exist on Android. http://itextpdf.com/product/itextg — Amedee Van Gasse, Nov 13 '15 at 10:46

Extract Images from Pdf via iTextSharp 4.1.6.0

1 Answers1

Linked