59

I need to convert PDF files to images. If the PDF file is multi-page,I just need one image that contains all of the PDF pages.

Is there an open source solution which is not charged like the Acrobat product?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
loveForEver
  • 667
  • 1
  • 7
  • 7

15 Answers15

31

The thread "converting PDF file to a JPEG image" is suitable for your request.

One solution is to use a third-party library. ImageMagick is a very popular and is freely available too. You can get a .NET wrapper for it here. The original ImageMagick download page is here.

And you also can take a look at the thread "How to open a page from a pdf file in pictureBox in C#".

If you use this process to convert a PDF to tiff, you can use this class to retrieve the bitmap from TIFF.

public class TiffImage
{
    private string myPath;
    private Guid myGuid;
    private FrameDimension myDimension;
    public ArrayList myImages = new ArrayList();
    private int myPageCount;
    private Bitmap myBMP;

    public TiffImage(string path)
    {
        MemoryStream ms;
        Image myImage;

        myPath = path;
        FileStream fs = new FileStream(myPath, FileMode.Open);
        myImage = Image.FromStream(fs);
        myGuid = myImage.FrameDimensionsList[0];
        myDimension = new FrameDimension(myGuid);
        myPageCount = myImage.GetFrameCount(myDimension);
        for (int i = 0; i < myPageCount; i++)
        {
            ms = new MemoryStream();
            myImage.SelectActiveFrame(myDimension, i);
            myImage.Save(ms, ImageFormat.Bmp);
            myBMP = new Bitmap(ms);
            myImages.Add(myBMP);
            ms.Close();
        }
        fs.Close();
    }
}

Use it like so:

private void button1_Click(object sender, EventArgs e)
{
    TiffImage myTiff = new TiffImage("D:\\Some.tif");
    //imageBox is a PictureBox control, and the [] operators pass back
    //the Bitmap stored at that position in the myImages ArrayList in the TiffImage
    this.pictureBox1.Image = (Bitmap)myTiff.myImages[0];
    this.pictureBox2.Image = (Bitmap)myTiff.myImages[1];
    this.pictureBox3.Image = (Bitmap)myTiff.myImages[2];
}
stimms
  • 42,945
  • 30
  • 96
  • 149
Gaurav Deochakke
  • 2,265
  • 2
  • 21
  • 26
  • 2
    The .NET Wrapper has a Nuget package as well – Icad Nov 30 '21 at 09:44
  • 3
    `6 Ways to Convert a PDF to a JPG Image` that is a link to an article that has nothing to do with programming or C#, just ways to convert PDFs manually using online tools – Alex P. Mar 14 '23 at 18:13
25

You can use Ghostscript to convert PDF to images.

To use Ghostscript from .NET you can take a look at Ghostscript.NET library (managed wrapper around the Ghostscript library).

To produce image from the PDF by using Ghostscript.NET, take a look at RasterizerSample.

To combine multiple images into the single image, check out this sample: http://www.niteshluharuka.com/2012/08/combine-several-images-to-form-a-single-image-using-c/#

Emily
  • 543
  • 4
  • 12
HABJAN
  • 9,212
  • 3
  • 35
  • 59
  • +1 This is the best way to do it. But, multiple pages in one Jpeg cannot be done using only jpeg. Tiff could be the solution. – Askolein May 28 '14 at 07:36
  • @Askolein: I updated my answer about multiple images to a single image – HABJAN May 28 '14 at 07:39
  • 3
    +1 best solution I've found so far on converting PDF to image format. The RasterizerSample1 class really helped. I used the Sample1() method and that worked straight away. The RasterizerSample link you've posted is broken, here is the link to the class I used: https://github.com/jhabjan/Ghostscript.NET/blob/master/Ghostscript.NET.Samples/Samples/RasterizerSample1.cs – blueprintchris Nov 17 '16 at 08:42
  • 19
    Note that Ghostscript itself is licensed under AGPL and cannot be used in commercial projects for free. I may recommend to use [Poppler](https://poppler.freedesktop.org/) tools instead (GPL license) with C# wrapper. – Vitaliy Fedorchenko Apr 07 '17 at 06:44
  • .net version of this library throws "OutOfMemory" exception for simple convertion operation. – Disappointed Feb 13 '18 at 09:46
  • @HABJAN Can I use GhostScript.NET dll without any installation of exe as I need to host my application on Azure? – Yogen Darji Apr 17 '18 at 06:03
  • 2
    After I opened the pdf the pagecount is "0". What could be the reason ? – Sachintha Nayanajith Aug 15 '19 at 05:25
  • 1
    @SachinthaNayanajith - looks like an open issue: https://github.com/jhabjan/Ghostscript.NET/issues/62 – mche Jan 17 '20 at 17:15
12

As for 2018 there is still not a simple answer to the question of how to convert a PDF document to an image in C#; many libraries use Ghostscript licensed under AGPL and in most cases an expensive commercial license is required for production use.

A good alternative might be using the popular 'pdftoppm' utility which has a GPL license; it can be used from C# as command line tool executed with System.Diagnostics.Process. Popular tools are well known in the Linux world, but a windows build is also available.

If you don't want to integrate pdftoppm by yourself, you can use my PdfRenderer popular wrapper (supports both classic .NET Framework and .NET Core) - it is not free, but pricing is very affordable.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Vitaliy Fedorchenko
  • 8,447
  • 3
  • 37
  • 34
  • 1
    Since poppler itself is GPL I don't see how using a commercial wrapper (regardless of its technical quality / features) enables using poppler in a commercial (non-GPL) product? – StayOnTarget Mar 27 '19 at 18:41
  • PdfRenderer uses poppler tools as a standalone command line programs (executed with System.Diagnostics.Process), not as library. This kind of usage is allowed by GPL: https://www.gnu.org/licenses/gpl-faq.html#GPLInProprietarySystem - when you use PdfRenderer you should clearly state that your program executes GPL poppler for some functions, and user can use poppler utilities without your program as free software. In case of web app, you are end-user and you can install/use GPL program on your server (as you don't redistribute it). – Vitaliy Fedorchenko Mar 28 '19 at 09:16
11

I used PDFiumSharp and ImageSharp in a .NET Standard 2.1 class library.

/// <summary>
/// Saves a thumbnail (jpg) to the same folder as the PDF file, using dimensions 300x423,
/// which corresponds to the aspect ratio of 'A' paper sizes like A4 (ratio h/w=sqrt(2))
/// </summary>
/// <param name="pdfPath">Source path of the pdf file.</param>
/// <param name="thumbnailPath">Target path of the thumbnail file.</param>
/// <param name="width"></param>
/// <param name="height"></param>
public static void SaveThumbnail(string pdfPath, string thumbnailPath = "", int width = 300, int height = 423)
{
    using var pdfDocument = new PdfDocument(pdfPath);
    var firstPage = pdfDocument.Pages[0];

    using var pageBitmap = new PDFiumBitmap(width, height, true);

    firstPage.Render(pageBitmap);

    var imageJpgPath = string.IsNullOrWhiteSpace(thumbnailPath)
        ? Path.ChangeExtension(pdfPath, "jpg")
        : thumbnailPath;
    var image = Image.Load(pageBitmap.AsBmpStream());

    // Set the background to white, otherwise it's black. https://github.com/SixLabors/ImageSharp/issues/355#issuecomment-333133991
    image.Mutate(x => x.BackgroundColor(Rgba32.White));

    image.Save(imageJpgPath, new JpegEncoder());
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
tjeerdhans
  • 1,058
  • 10
  • 13
8

The PDF engine used in Google Chrome, called PDFium, is open source under the "BSD 3-clause" license. I believe this allows redistribution when used in a commercial product.

There is a .NET wrapper for it called PdfiumViewer (NuGet) which works well to the extent I have tried it. It is under the Apache license which also allows redistribution.

(Note that this is NOT the same 'wrapper' as https://pdfium.patagames.com/ which requires a commercial license)

(There is one other PDFium .NET wrapper, PDFiumSharp, but I have not evaluated it.)

In my opinion, so far, this may be the best choice of open-source (free as in beer) PDF libraries to do the job which do not put restrictions on the closed-source / commercial nature of the software utilizing them. I don't think anything else in the answers here satisfy that criteria, to the best of my knowledge.

StayOnTarget
  • 11,743
  • 10
  • 52
  • 81
  • 1
    Note: The PdfiumViewer project has been archived and is not actively being developed. :-( The gethub/nuget repository/package is still available for download. – Jeff Jun 05 '20 at 20:13
  • PdfiumCore can be a replacement for PdfiumViewer. see my answer below. – HamedH May 28 '21 at 19:51
8

Searching for a powerful and free solution in dotnet core that works on Windows and Linux got me to https://github.com/Dtronix/PDFiumCore and https://github.com/GowenGit/docnet. As PDFiumCore use a much newer version of Pdfium (that seems to be a critical point for using a pdf library) I ended up using it.

Note: If you want to use it on Linux you should install 'libgdiplus' as https://stackoverflow.com/a/59252639/6339469 suggests.

Here's a simple single thread code:

var pageIndex = 0;
var scale = 2;

fpdfview.FPDF_InitLibrary();

var document = fpdfview.FPDF_LoadDocument("test.pdf", null);

var page = fpdfview.FPDF_LoadPage(document, pageIndex);

var size = new FS_SIZEF_();
fpdfview.FPDF_GetPageSizeByIndexF(document, 0, size);

var width = (int)Math.Round(size.Width * scale);
var height = (int)Math.Round(size.Height * scale);

var bitmap = fpdfview.FPDFBitmapCreateEx(
    width,
    height,
    4, // BGRA
    IntPtr.Zero,
    0);

fpdfview.FPDFBitmapFillRect(bitmap, 0, 0, width, height, (uint)Color.White.ToArgb());

// |          | a b 0 |
// | matrix = | c d 0 |
// |          | e f 1 |
using var matrix = new FS_MATRIX_();
using var clipping = new FS_RECTF_();

matrix.A = scale;
matrix.B = 0;
matrix.C = 0;
matrix.D = scale;
matrix.E = 0;
matrix.F = 0;

clipping.Left = 0;
clipping.Right = width;
clipping.Bottom = 0;
clipping.Top = height;

fpdfview.FPDF_RenderPageBitmapWithMatrix(bitmap, page, matrix, clipping, (int)RenderFlags.RenderAnnotations);

var bitmapImage = new Bitmap(
    width,
    height,
    fpdfview.FPDFBitmapGetStride(bitmap),
    PixelFormat.Format32bppArgb,
    fpdfview.FPDFBitmapGetBuffer(bitmap));

bitmapImage.Save("test.jpg", ImageFormat.Jpeg);

For a thread safe implementation see this: https://github.com/hmdhasani/DtronixPdf/blob/master/src/DtronixPdfBenchmark/Program.cs

HamedH
  • 2,814
  • 1
  • 26
  • 37
6

Regarding PDFiumSharp: After elaboration I was able to create PNG files from a PDF solution.

This is my code:

using PDFiumSharp;
using System.Collections.Generic;
using System.Drawing;
using System.IO;

public class Program
{
    static public void Main(String[] args)
    {
        var renderfoo = new Renderfoo()
        renderfoo.RenderPDFAsImages(@"C:\Temp\example.pdf", @"C:\temp");
    }
}



public class Renderfoo
{

    public void RenderPDFAsImages(string Inputfile, string OutputFolder)
    {
        string fileName = Path.GetFileNameWithoutExtension(Inputfile);

        using (PDFiumSharp.PdfDocument doc = new PDFiumSharp.PdfDocument(Inputfile))
        {
            for (int i = 0; i < doc.Pages.Count; i++)
            {
                var page = doc.Pages[i];
                using (var bitmap = new System.Drawing.Bitmap((int)page.Width, (int)page.Height))
                {
                    var grahpics = Graphics.FromImage(bitmap);
                    grahpics.Clear(Color.White);
                    page.Render(bitmap);
                    var targetFile = Path.Combine(OutputFolder, fileName + "_" + i + ".png");
                    bitmap.Save(targetFile);
                }
            }
        }
    }

}

For starters, you need to take the following steps to get the PDFium wrapper up and running:

  • Run the Custom Code tool for both tt files via right click in Visual Studio
  • Compile the GDIPlus Project
  • Copy the compiled assemblies (from the GDIPlus project) to your project
  • Reference both PDFiumSharp and PDFiumsharp.GdiPlus assemblies in your project

  • Make sure that pdfium_x64.dll and/or pdfium_x86.dll are both found in your project output directory.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Dominik Sand
  • 462
  • 4
  • 7
  • 3
    cannot convert from 'System.Drawing.Bitmap' to 'PDFiumSharp.PDFiumBitmap for this line: page.Render(bitmap); – CountLessQ Jan 31 '20 at 14:56
  • 2
    Your Error is expected if you don't add the class RenderingExtensionsGdiPlus , which is contained in the GDI Plus Assembly. Without the Assembly and the containing class it won't work. – Dominik Sand Feb 03 '20 at 06:34
  • How do I add the RenderingExtensionsGdiPlus class? I can't find it... – Rafael Ventura Dec 20 '20 at 14:07
6

You may check Freeware.Pdf2Png MIT license. Just find in nuget those name.

var dd = System.IO.File.ReadAllBytes("pdffile.pdf");
byte[] pngByte = Freeware.Pdf2Png.Convert(dd, 1);
System.IO.File.WriteAllBytes(Path.Combine(@"C:\temp", "dd.png"), pngByte );
Haryono
  • 2,184
  • 1
  • 21
  • 14
4

The NuGet package Pdf2Png is available for free and is only protected by the MIT License, which is very open.

I've tested around a bit and this is the code to get it to convert a PDF file to an image (tt does save the image in the debug folder).

using cs_pdf_to_image;
using PdfToImage;

private void BtnConvert_Click(object sender, EventArgs e)
{
    if(openFileDialog1.ShowDialog() == DialogResult.OK)
    {
        try
        {
            string PdfFile = openFileDialog1.FileName;
            string PngFile = "Convert.png";
            List<string> Conversion = cs_pdf_to_image.Pdf2Image.Convert(PdfFile, PngFile);
            Bitmap Output = new Bitmap(PngFile);
            PbConversion.Image = Output;
        }
        catch(Exception E)
        {
            MessageBox.Show(E.Message);
        }
    }
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Melvin Winthagen
  • 277
  • 1
  • 2
  • 9
  • 1
    @MaxVollmer I think most of your feedback has been addressed. – StayOnTarget Mar 27 '19 at 18:43
  • 6
    When you click through the Nuget package to the project page (https://github.com/chen0040/cs-pdf-to-image) it mentions that it uses GhostScript. So it does not have the licensing benefits that it would first appear. – StayOnTarget Mar 27 '19 at 18:44
  • 3
    I tested quickly and found 1), only converts first page of multi-page PDF. 2), Image resolution was poor for the font n the PDF I tested, output image was only 612 × 792px so this may account for the poor resolution. From the comments on the project, it seems others faced the same issues. – mike Apr 08 '19 at 03:02
  • 2
    pdf2png package has very poor output quality. – bmi Dec 19 '19 at 09:43
1

Apache PDFBox also works great for me.

Usage with the command line tool:

javar -jar pdfbox-app-2.0.19.jar PDFToImage -quality 1.0  -dpi 150 -prefix out_dir/page -format png
Barna Kovacs
  • 1,226
  • 1
  • 14
  • 35
1

There is a free nuget package (Pdf2Image), which allows the extraction of pdf pages to jpg files or to a collection of images (List ) in just one line

        string file = "c:\\tmp\\test.pdf";

        List<System.Drawing.Image> images = PdfSplitter.GetImages(file, PdfSplitter.Scale.High);

        PdfSplitter.WriteImages(file, "c:\\tmp", PdfSplitter.Scale.High, PdfSplitter.CompressionLevel.Medium);

All source is also available on github Pdf2Image

Kabindas
  • 802
  • 9
  • 6
  • 1
    This is using itextsharp and pdfium. Why not recommend Pdfium at first place? – H.A.H. Jun 24 '22 at 11:23
  • 1
    @H.A.H. Because with this lib, working as a wrapper on itextsharp and pdfium, you only need 3 lines of code to extract images from PDF. But you prefer working with pdfium to achieve the same results, be my guess. – Kabindas Aug 29 '22 at 23:36
0

Using Android default libraries like AppCompat, you can convert all the PDF pages into images. This way is very fast and optimized. The below code is for getting separate images of a PDF page. It is very fast and quick.

ParcelFileDescriptor fileDescriptor = ParcelFileDescriptor.open(new File("pdfFilePath.pdf"), MODE_READ_ONLY);
    PdfRenderer renderer = new PdfRenderer(fileDescriptor);
    final int pageCount = renderer.getPageCount();
    for (int i = 0; i < pageCount; i++) {
        PdfRenderer.Page page = renderer.openPage(i);
        Bitmap bitmap = Bitmap.createBitmap(page.getWidth(), page.getHeight(),Bitmap.Config.ARGB_8888);
        Canvas canvas = new Canvas(bitmap);
        canvas.drawColor(Color.WHITE);
        canvas.drawBitmap(bitmap, 0, 0, null);
        page.render(bitmap, null, null, PdfRenderer.Page.RENDER_MODE_FOR_DISPLAY);
        page.close();

        if (bitmap == null)
            return null;

        if (bitmapIsBlankOrWhite(bitmap))
            return null;

        String root = Environment.getExternalStorageDirectory().toString();
        File file = new File(root + filename + ".png");

        if (file.exists()) file.delete();
        try {
            FileOutputStream out = new FileOutputStream(file);
            bitmap.compress(Bitmap.CompressFormat.PNG, 100, out);
            Log.v("Saved Image - ", file.getAbsolutePath());
            out.flush();
            out.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

=======================================================

private static boolean bitmapIsBlankOrWhite(Bitmap bitmap) {
    if (bitmap == null)
        return true;

    int w = bitmap.getWidth();
    int h = bitmap.getHeight();
    for (int i =  0; i < w; i++) {
        for (int j = 0; j < h; j++) {
            int pixel =  bitmap.getPixel(i, j);
            if (pixel != Color.WHITE) {
                return false;
            }
        }
    }
    return true;
}
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Rahul
  • 579
  • 1
  • 8
  • 18
-1

I kind of bumped into this project at SourceForge. It seems to me it's still active.

  1. PDF convert to JPEG at SourceForge
  2. Developer's site

My two cents.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
agodinhost
  • 381
  • 4
  • 16
-2

https://www.codeproject.com/articles/317700/convert-a-pdf-into-a-series-of-images-using-csharp

I found this GhostScript wrapper to be working like a charm for converting the PDFs to PNGs, page by page.

Usage:

string pdf_filename = @"C:\TEMP\test.pdf";            
var pdf2Image = new Cyotek.GhostScript.PdfConversion.Pdf2Image(pdf_filename);
for (var page = 1; page < pdf2Image.PageCount; page++)
{
    string png_filename = @"C:\TEMP\test" + page + ".png";
    pdf2Image.ConvertPdfPageToImage(png_filename, page);
}

Being built on GhostScript, obviously for commercial application the licensing question remains.

cccec
  • 43
  • 1
  • 2
  • 9
-7

(Disclaimer I worked on this component at Software Siglo XXI)

You could use Super Pdf2Image Converter to generate a TIFF multi-page file with all the rendered pages from the PDF in high resolution. It's available for both 32 and 64 bit and is very cheap and effective. I'd recommend you to try it.

Just one line of code...

GetImage(outputFileName, firstPage, lastPage, resolution, imageFormat)

Converts specifies pages to image and save them to outputFileName (tiff allows multi-page or creates several files)

You can take a look here: http://softwaresigloxxi.com/SuperPdf2ImageConverter.html

M. Cota
  • 15
  • 1