Convert pdf to jpeg using a free c# solution

Question

I need to convert a pdf file into a jpeg using C#. And the solution (library) has to be free.

I have searched a lot of information but seems that I don't get anything clear.

I already tried itextsharp and pdfbox (but this, the pdf2image is only for java, I think) with no success.

I tried to extract the images from the pdf individually, but I have an error of invalid parameters when I try to extract the images... Seems that they have a strange encoding.

Anyone can recommend me any library to save a pdf into a jpeg? Examples will be very appreciated too.

GhostScript is not free. https://www.ghostscript.com/license.html Only the scripts around are. GhostScript license itself costs $25000.- + $0.25 per client https://archive.sap.com/discussions/thread/3958792 or you need to to open source code — Nasenbaer, Oct 09 '18 at 13:13

Vijay Gill · Accepted Answer · 2017-04-22T11:49:17.473

29

The library pdfiumviewer might be helpful here. It is also available as nuget.

Create a new winforms app. Add nuget "PdfiumViewer" to it.
This will also add two native dll's named "pdfium.dll" in folders x86 and x64 to your project. Set "Copy to Output Directory" to "Copy Always".

Try out the following code (change paths to suit your setup).

    try
    {
        using (var document = PdfiumViewer.PdfDocument.Load(@"input.pdf"))
        {
            var image = document.Render(0, 300, 300, true);
            image.Save(@"output.png", ImageFormat.Png);
        }
    }
    catch (Exception ex)
    {
        // handle exception here;
    }

Edit 2: Changed code to show that page index is 0 based as pointed out in comment by S.C. below

Edit 1: Updated solution ~~Have you tried pdfsharp?~~

This link might be helpful

edited Apr 22 '17 at 11:49

answered Jul 21 '11 at 11:13

Vijay Gill

1,508
1
14
16

@Vijay Gill Hi. Thanks for the recommendation but it isn't working. It extract the images but the format is not readable... seems that the images arent in jpeg format... I think that exporting all the single images is more difficult than export the entire page of the pdf to image (the two methods are ok for what i want). – FrioneL Jul 22 '11 at 11:43
Open the images in some hex editor and see the initial few bytes/characters to guess the format by looking at the signature. Like BPM has BM, JPEG has JFIF, PNG ans PNG in first few bytes. That might help you in knowing the format. – Vijay Gill Jul 22 '11 at 12:09
@Vijay Gill I did it but i dont understand what i get. The first 5 hexadecimals numbers are: "58 09 ec ed 07". And the traduction is incompressible. I think that it isnt an image... – FrioneL Jul 25 '11 at 08:05
I finally get it with a ghostscript library :) – FrioneL Jul 26 '11 at 10:32
PDFSharp cannot save PDFs -> Image. It can be used to extract images from your PDF but it does not save PDFs -> images. http://www.pdfsharp.net/wiki/ExportImages-sample.ashx which says: PDFsharp cannot render PDF pages - not to printers, not to bitmaps, not to JPEG files. – saurabhj Jul 03 '16 at 18:29
@saurabhj: What I understood from the original question was that the OP wanted to extract images and not render the PDF pages to images. Anyway, I have edited my solution to use different library to suit the question now. You may up-vote if you like it:) – Vijay Gill Jul 06 '16 at 21:00
1

@VijayGill Fair enough. I have removed the down-vote. I just wanted to prevent lots of people spending unnecessary time trying to get PDFSharp to work when it is not at all possible. – saurabhj Jul 07 '16 at 04:58
It's worth noting that at least at this point `Render` appears to accept a 0-based page index, so the example here would render the second page, not the first. – S.C. Apr 21 '17 at 17:45

score 2 · Answer 2 · edited May 14 '13 at 19:46

This is how I did it with PDFLibNet:

public void ConvertPDFtoHojas(string filename, String dirOut)
{
    PDFLibNet.PDFWrapper _pdfDoc = new PDFLibNet.PDFWrapper();
    _pdfDoc.LoadPDF(filename);

    for (int i = 0; i < _pdfDoc.PageCount; i++)
    {

        Image img = RenderPage(_pdfDoc, i);

        img.Save(Path.Combine(dirOut, string.Format("{0}{1}.jpg", i,DateTime.Now.ToString("mmss"))));

    }
    _pdfDoc.Dispose();
    return;
}
public  Image RenderPage(PDFLibNet.PDFWrapper doc, int page)
{
    doc.CurrentPage = page + 1;
    doc.CurrentX = 0;
    doc.CurrentY = 0;

    doc.RenderPage(IntPtr.Zero);

        // create an image to draw the page into
        var buffer = new Bitmap(doc.PageWidth, doc.PageHeight);
        doc.ClientBounds = new Rectangle(0, 0, doc.PageWidth, doc.PageHeight);
        using (var g = Graphics.FromImage(buffer))
        {
            var hdc = g.GetHdc();
            try
            {
                doc.DrawPageHDC(hdc);
            }
            finally
            {
                g.ReleaseHdc();
            }
        }
        return buffer;

}

I was able to get this code working using PDFLibNet from package manager I modified this line of code... PDFLibNet64.PDFWrapper _pdfDoc = new PDFLibNet64.PDFWrapper(); — pianocomposer, Apr 20 '21 at 17:05
@jorge-alberto-chavez-barrera can you change this code to save image in memory and return directly as handler ? — Hadi Ranji, Jul 14 '22 at 11:48

Convert pdf to jpeg using a free c# solution

2 Answers2

Linked