19

Is there any way, I can convert HTML Document (file not URL) to Image, or PDF to image?

I am able to do the above using Ghostscript DLL , Is there any other way , I can do it, without using the Ghostscript DLL?

I am developing a C# Windows Application.

Furqan Safdar
  • 16,260
  • 13
  • 59
  • 93
D J
  • 963
  • 2
  • 14
  • 22

9 Answers9

7

the best and free nuget package that you can save every page of your Pdf to png and with custom resilution Docnet.core this can be use in the .net core project.

they have github and nice examples but here i want to add my code for reading en pdf with more that one page

        string webRootPath = _hostingEnvironment.WebRootPath;
        string fullPath = webRootPath + "/uploads/user-manual/file.pdf";
        string fullPaths = webRootPath + "/uploads/user-manual";

        using (var library = DocLib.Instance)
        {
            using (var docReader = library.GetDocReader(fullPath, 1080, 1920))
            {
                for (int i = 1; i < docReader.GetPageCount(); i++)
                {
                    using (var pageReader = docReader.GetPageReader(i))
                    {
                        var bytes = EmailTemplates.GetModifiedImage(pageReader);

                        System.IO.File.WriteAllBytes(fullPaths+"/page_image_" +i+".png", bytes);
                    }
                }

            }
        }

Other functions you can find in thier github repo.

Mohammad Hassani
  • 511
  • 6
  • 14
5

Use LibPdf, for PDF to Image conversion

LibPdf library converts converts PDF file to an image. Supported image formats are PNG and BMP, but you can easily add more.

Usage example:

using (FileStream file = File.OpenRead(@"..\path\to\pdf\file.pdf")) // in file
{
    var bytes = new byte[file.Length];
    file.Read(bytes, 0, bytes.Length);
    using (var pdf = new LibPdf(bytes))
    {
        byte[] pngBytes = pdf.GetImage(0,ImageType.PNG); // image type
        using (var outFile = File.Create(@"..\path\to\pdf\file.png")) // out file
        {
            outFile.Write(pngBytes, 0, pngBytes.Length);
        }
    }
}

ImageMagick, you should also look at this freely available and powerful tool. It's capable of doing what you want and also provides some .NET bindings (as well as bindings to several other languages).

In its simplest form, it's just like writing a command

convert file.pdf imagefile.png
Furqan Safdar
  • 16,260
  • 13
  • 59
  • 93
  • Corrected the link for LibPdf which pointed to ImageMagick as well. –  Oct 11 '12 at 04:26
  • 2
    Thanks a lot , I am still getting the exception "Could not load file or assembly 'libpdf.DLL' or one of its dependencies. The specified module could not be found. " ? – D J Oct 11 '12 at 07:35
  • Which .NET framework are you using, in which you have included this library? – Furqan Safdar Oct 11 '12 at 07:42
  • 4.0 , I have included only the pdflib.dll , downloaded from http://code.google.com/p/lib-pdf/ – D J Oct 11 '12 at 07:44
  • Try checking your framework for confirmation. If your project framework is set to '.NET Framework 4 Client Profile', change it to '.NET Framework 4' – Furqan Safdar Oct 11 '12 at 07:52
  • 1
    am also getting the same problem , my application is .net framework 4.can any one please suggest how to over come from this issue – D J Oct 11 '12 at 09:16
  • Changed the version , still the same issues – D J Oct 11 '12 at 10:07
  • Try this link for library inclusion problem, this may help you. http://stackoverflow.com/questions/4469929/could-not-load-file-or-assembly-or-one-of-its-dependencies – Furqan Safdar Oct 11 '12 at 11:09
  • 1
    This is old, but there's several DLLs the binary distribution of lib-pdf seem to be missing: QtXml4.dll, freetype.dll, and Zlib.dll... It's not clear to me which versions of Freetype and Zlib are needed (although Zlib doesn't change often), but you can examine the DLLs packaged with the distribution to figure out which 4.x version of QtXml4.dll is required. See http://qt-windows-binaries.googlecode.com/svn/site/QtWindowsBinaries .... that being said, I still haven't gotten this library to function properly. – Kaganar Aug 09 '13 at 21:43
  • 11
    Was LibPDF recommended based solely on a Google search? The code sample provided is just copied from their page. The problem is that, so far as I can tell, this library does not work. I messed with it for an hour or two with no success. There are quite a few people posting in the issues log that they cannot get it to work either. – Sean Worle Oct 17 '13 at 21:47
  • Any free utilities like LibPdf that work in 64-bit mode? – Shaul Behr Jul 30 '14 at 16:04
  • 24
    ImageMagick uses GhostScript to convert PDFs. – CleverPatrick Sep 20 '14 at 19:17
  • 4
    Don't understand why this was accepted as the answer when no one can get it to work. – blueprintchris Nov 16 '16 at 10:01
4

Try Freeware.Pdf2Png, check below url:

PDF to PNG converter.

byte[] png = Freeware.Pdf2Png.Convert(pdf, 1);

https://www.nuget.org/packages/Freeware.Pdf2Png/1.0.1?_src=template

In the about info, It said MIT license, I check it on March 22, 2022. But as said Mitya, please double check.

enter image description here

Haryono
  • 2,184
  • 1
  • 21
  • 14
2

Using docnet, based in this example on github, I did this, very simple and functional :

pdf used in this example.

//...
using Docnet.Core;
using System.IO;
using Docnet.Core.Models;
using System.Drawing;
using System.Drawing.Imaging;
using System.Runtime.InteropServices;

//paths
string pathPdf = @"C:\pathToPdfFile\lorem-ipsum.pdf";
string finalPathWithFileName = @"C:\pathToFinalImageFile\finalFile.png";

//using docnet
using (var docReader = DocLib.Instance.GetDocReader(pathPdf, new PageDimensions(1080, 1920)))
{
    //open pdf file
    using (var pageReader = docReader.GetPageReader(0))
    {
        var rawBytes = pageReader.GetImage();
        var width = pageReader.GetPageWidth();
        var height = pageReader.GetPageHeight();
        var characters = pageReader.GetCharacters();

        //using bitmap to create a png image
        using (var bmp = new Bitmap(width, height, PixelFormat.Format32bppArgb))
        {
            AddBytes(bmp, rawBytes);

            using (var stream = new MemoryStream())
            {
                //saving and exporting
                bmp.Save(stream, ImageFormat.Png);
                File.WriteAllBytes(finalPathWithFileName, stream.ToArray());
            };
        };
    };
};

//extra methods
private static void AddBytes(Bitmap bmp, byte[] rawBytes)
{
    var rect = new Rectangle(0, 0, bmp.Width, bmp.Height);

    var bmpData = bmp.LockBits(rect, ImageLockMode.WriteOnly, bmp.PixelFormat);
    var pNative = bmpData.Scan0;

    Marshal.Copy(rawBytes, 0, pNative, rawBytes.Length);
    bmp.UnlockBits(bmpData);
}

Diego Montania
  • 322
  • 5
  • 12
1

You can use below any one library for PDF to Image conversion

Use Aspose.pdf link below: http://www.aspose.com/docs/display/pdfnet/Convert+all+PDF+pages+to+JPEG+Images

code sample:

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(MyPdfPath));
using (FileStream imageStream = new FileStream(MyOutputImage.png, FileMode.Create))
{
     Resolution resolution = new Resolution(300);
    PngDevice pngDevice = new PngDevice(resolution);
    pngDevice.Process(pdfDocument.Pages[PageNo], MyOutputImage);
    imageStream.Close();
}

Use Bytescout PDF Renderer link below: http://bytescout.com/products/developer/pdfrenderersdk/convert-pdf-to-png-basic-examples

code sample :

MemoryStream ImageStream = new MemoryStream();
RasterRenderer renderer = new RasterRenderer();
renderer.RegistrationName = "demo";
renderer.RegistrationKey = "demo";
// Load PDF document.
renderer.LoadDocumentFromFile(FilePath);
for (int i = 0; i < renderer.GetPageCount(); i++)
{
    // Render first page of the document to PNG image file.
    renderer.RenderPageToStream(i, RasterOutputFormat.PNG, ImageStream);
}
Image im = Image.FromStream(ImageStream);
im.Save("MyOutputImage.png");
ImageStream.Close();
Chris Schiffhauer
  • 17,102
  • 15
  • 79
  • 88
  • 6
    Is aspose.pdf is free to use ? – D J Oct 11 '12 at 07:49
  • Aspose give there trial versions as well as Purchase License. –  Oct 11 '12 at 07:50
  • they give trial versions of a month. After completing trial period You will have to purchase it. –  Oct 11 '12 at 08:14
  • **aspose.pdf only let you extract 4 elements**. If you want to extract more elements you get an System.IndexOutOfRangeException: _"At most 4 elements (for any collection) can be viewed in evaluation mode."_ Maybe this comment help others to save the time to try it... – PeterCo Jun 14 '22 at 12:30
1

Spire.PDF library can be used for PDF to Image conversion, such as PDF to PNG, JPG, EMF and TIFF etc.

The following is the code example shows how to convert PDF to PNG:

 //Load a PDF
 PdfDocument doc = new PdfDocument();
 doc.LoadFromFile("PdfFilePath");

 //Save to PNG images
 for (int i = 0; i < doc.Pages.Count; i++)
 {
     String fileName = String.Format("ToImage-img-{0}.png", i);
     using (Image image = doc.SaveAsImage(i,300,300))
     {
         image.Save(fileName, System.Drawing.Imaging.ImageFormat.Png);
     }
 }

 doc.Close();

More conversion examples can be found in the library's documentation. It also provides a free community edition but with some limitations.

Dheeraj Malik
  • 703
  • 1
  • 4
  • 8
1

Freeware.Pdf2Png worked great for my needs. It does not only convert to Png, you can save to the image format of your choice.

In MS Visual Studio run this in your Package Manager console PM> NuGet\Install-Package Freeware.Pdf2Png -Version 1.0.1, or just add via the NuGet Package Manager GUI, search for Freeware.Pdf2Png and it should come up.

Once the reference is added to your project, code similar to this should do what you need to convert a PDF to an Image.

using (FileStream fs = new FileStream(FullFilePath, FileMode.Open))
{
    byte[] buff = Freeware.Pdf2Png.Convert(fs, 1);
    MemoryStream ms = new MemoryStream(buff);
    Image img = Image.FromStream(ms);
    img.Save(TiffFilePath, System.Drawing.Imaging.ImageFormat.Tiff);
}

FullFilePath - a string that is the Full File Path to the PDF to be converted.

TiffFilePath - a string that is the Full File Path of the newly created Image that you would like to save.

Unfortunately I was not able to find any c# code or proper algorithm to do this conversion without a 3rd party DLL. If any of you have good information for that please do share it!

ChefJames
  • 11
  • 3
0

While using Ghostscript with ImageMagick is a potential option, it is incredibly slow, every page would take around 5 or more seconds. DocNet is a much better option to convert pdf to images. The following code would convert all pages in a pdf file into Images, and do that fast.

 public void SavePDFtoJPGDocnet(string fileName)
    {
        string FilePath = @"C:\SampleFileFolder\doc.pdf";
        string DestinationFolder = @"C:\SampleFileFolder\";

        IDocLib DocNet = DocLib.Instance;

        //you are specifying the max resolution of image on any side, actual resolution will be limited by longer side, 
        //preserving the aspect ratio
        var docReader = DocNet.GetDocReader(
        FilePath,
        new PageDimensions(1440, 2560));


        for (int i = 0; i < docReader.GetPageCount(); i++)
        {
            using (var pageReader = docReader.GetPageReader(i))
            {
                var rawBytes = pageReader.GetImage();

                var width = pageReader.GetPageWidth();
                var height = pageReader.GetPageHeight();

                var characters = pageReader.GetCharacters();

                var bmp = new Bitmap(width, height, PixelFormat.Format32bppArgb);

                DocnetClass.AddBytes(bmp, rawBytes);
                //DocnetClass.DrawRectangles(bmp, characters);

                var stream = new MemoryStream();

                bmp.Save(stream, ImageFormat.Png);

                File.WriteAllBytes(DestinationFolder + "/page_image_" + i + ".png", stream.ToArray());

            }
        }

           
        

       
    }
Nouman Qaiser
  • 283
  • 4
  • 14
-1

In case someone wants to use Ghostscript.NET.

Ghostscript.NET - (written in C#) is the most completed managed wrapper library around the Ghostscript library (32-bit & 64-bit), an interpreter for the PostScript language, PDF.

It is dependent on executable file you have to install on your machine. Here is a link from where you can see and download the latest version of the exe.

https://www.ghostscript.com/download/gsdnld.html

P.S. I had some troubles with the latest version 9.50 not being able to count the pages.

I prefer using the 9.26 version.

https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs926/gs926aw32.exe

https://github.com/ArtifexSoftware/ghostpdl-downloads/releases/download/gs926/gs926aw64.exe

Next step is to find and install Ghostscript.NET from Nuget. I download the PDF from CDN url and use the MemoryStream to open and process the PDF file. Here is a sample code:

using (WebClient myWebClient = new WebClient())
            {
                using (GhostscriptRasterizer rasterizer = new GhostscriptRasterizer())
                {
                    /* custom switches can be added before the file is opened

                    rasterizer.CustomSwitches.Add("-dPrinted");

                    */
                    byte[] buffer = myWebClient.DownloadData(pdfUrl);
                    using (var ms = new MemoryStream(buffer))
                    {
                        rasterizer.Open(ms);
                        var image = rasterizer.GetPage(0, 0, 1);

                        var imageURL = "MyCDNpath/Images/" + filename + ".png";
                        _ = UploadFileToS3(image, imageURL);
                    }
                }
            }

You can also use it with temporary FileStream. Here is another example. Note that the File is temporary and has DeleteOnClose mark.

using (WebClient myWebClient = new WebClient())
            {
                using (GhostscriptRasterizer rasterizer = new GhostscriptRasterizer())
                {
                    /* custom switches can be added before the file is opened

                    rasterizer.CustomSwitches.Add("-dPrinted");

                    */
                    byte[] buffer = myWebClient.DownloadData(pdfUrl);

                    int bufferSize = 4096;

                    using (var fileStream = System.IO.File.Create("TempPDFolder/" + pdfName, bufferSize, System.IO.FileOptions.DeleteOnClose))
                    {
                        // now use that fileStream to save the pdf stream
                        fileStream.Write(buffer, 0, buffer.Length);
                        rasterizer.Open(fileStream);
                        var image = rasterizer.GetPage(0, 0, 1);

                        var imageURL = "MyCDNpath/Images/" + filename + ".png";

                        _ = UploadFileToS3(image, imageURL);
                    }
                }
            }

Hope it will help someone struggling to get high quality images from pdf for free.

Community
  • 1
  • 1
  • Be aware that ghostscript uses the AGPL licence and you cannot directly link to the DLL from a closed-source program. If you link to ghostscript you must give away the source code for your entire application, or buy a commercial licence from Artifex. This is why imagemagick shells out to the gs command-line program rather than calling the library. See https://artifex.com/licensing/ – jcupitt Sep 07 '21 at 09:39