How to convert PDF to Image in c#?

Question

I want to convert pdf's pages to png format. I know my code is not correct but I couldn't find what to do. Also, I want to use iTextSharp only.

Here is source code:

    public void PDFDisplay(string DosyaAdi, int PerNr, int ID, int FileId, string message)
    {

        string filepath = Server.MapPath(@"~/Content/Egitim/Files/") + DosyaAdi;
        PdfReader pdfReader = new PdfReader(filepath);
        int numberOfPages = pdfReader.NumberOfPages;
        string path = Server.MapPath(@"~/Content/Egitim/Slides/" + DosyaAdi + "/");
        Directory.CreateDirectory(path);

        System.Drawing.Image[] image1 = new System.Drawing.Image[numberOfPages];
        for (int i = 1; i < numberOfPages; i++)
        {
            byte[] pdfPage = pdfReader.GetPageContent(i);
            using (MemoryStream ms = new MemoryStream(pdfPage))
            {
                image1[i] = System.Drawing.Image.FromStream(ms);//error occurs here. Invalid parameter (ms)
            }
            image1[i].Save(path, System.Drawing.Imaging.ImageFormat.Png);

        }

Any ideas would be appreciated, thank you.

What's wrong with this code? What issue you are facing with this? — Chetan, Jul 10 '18 at 11:11
@Ceren post the full exception, including its call stack. You can get it easily with `Exception.ToString()`. People can't guess what's wrong just by looking at the code. Also *debug* your code. Only you can check whether `pdfPage` is null or empty — Panagiotis Kanavos, Jul 10 '18 at 11:14
BTW are you *sure* that `GetPageContent()` will return an *image*? A page typically contains text, not images — Panagiotis Kanavos, Jul 10 '18 at 11:16
@PanagiotisKanavos No, I'm not sure, you are right. It doesn't return image but can't it convert content to image? I'm using byte array, I thought byte array can be image. I think I should look for another options. Thank you :) — Ceren, Jul 10 '18 at 11:17
ArgumentException is thrown when either stream is null or stream does not have a valid image format [msdn docs](https://msdn.microsoft.com/en-us/library/93z9ee4x(v=vs.110).aspx#Anchor_1). Please check what pdfReader.GetPageContent returns as mentioned by Panagiotis Kanavos above. — nilsK, Jul 10 '18 at 11:20
@Ceren Add a `try/catch` block and log the exception there, instead of trying to look what it contains in the Watch window. Code can always fail eg because the PDF doesn't contain images or because the file is corrupted. You'll have to catch and log the exception at least — Panagiotis Kanavos, Jul 10 '18 at 11:22
Please stop commenting if you don't know what is returned by the `GetPageContent()` method. Read my answer, and be aware that I voted to close this question as "off topic" as questions asking "Why isn't this code working" is not allowed on Stack Overflow. — Bruno Lowagie, Jul 10 '18 at 11:23

score 2 · Accepted Answer · answered Jul 10 '18 at 11:21

You are making the assumption that iText can convert PDF syntax (vector data) to an image (raster image). That assumption is wrong. iText does not convert PDF to images!

You are using the GetPageContent() method. This method get the content stream of a page. That content stream consists of operators and operands that change the graphics state and the text state, and, by doing so, define what is drawn on a page.

However, this page stream is far from sufficient to draw a page, since each page also refers to plenty of resources:

Images are usually stored in separate PDF objects. The page stream refers to them, but doesn't contain them.
Part of the syntax can be stored in an external object, referred to as a Form XObject. The page stream refers to these external objects, but doesn't contain them.
Annotations (such as widget annotations for form fields, text annotations,...) aren't part of the page stream. Annotations are added as a layer on top of the page. The /Annots entry of the page dictionary refers to the annotations on a specific page.
Fonts are never part of the page stream. The syntax refers to a font by a name that is an entry of the /Fonts entry of the page resources.
...

In short: it is normal that your code can't work. The answer to your question "How to convert PDF to Image in c#?" is: Not with iText!

If you change the question into: which tool can I use instead, then your question becomes off-topic, as the Stack Overflow FAQ clearly states that you can't post a question asking for recommendations about a tool, library,...

However, if I can give one recommendation: iText uses Ghostscript in tests. Look at the source code of iText on GitHub to get an idea on how to do it. In particular look at the CompareTool class. — Amedee Van Gasse, Jul 10 '18 at 11:43

How to convert PDF to Image in c#?

1 Answers1