c# PDF to Bmp for free

Question

I am writing a program that uses OCR (tessnet2) to scan an image file and extract certain information. This was easy before I found out that I was going to be scanning attachments of PDFs from an Exchange server.

The first problem I am working on is how to convert my PDFs to BMP files. From what I can tell so far of TessNet2, it can only read in image files - specifically BMP. So I am now tasked with converting a PDF of indeterminate size (2 - 15 pages) to BMP image. After that is done I can easily scan each image using the code I have built already with TessNet2.

I have seen things using Ghostscript to do this task - i'm just wondering if there was another free solution or if one of you fine humans could give me a crash course on how to do this using Ghostscript.

Why not just use Ghostscript or iTextSharp, or SharpPdf or some other free PDF library to open the pdf and get the text directly? — Chris Dunaway, Jul 09 '13 at 21:47
Because the PDFs are images received by our fax machine, scanned into our system as a TIFF file, and then packaged together as PDF. So I have to use OCR to read the information...which is very sketchy at times. — MaylorTaylor, Jul 09 '13 at 21:51
I believe none of them are free for commercial purposes. I'm newbie to Ghostscript, is it free for commercial apps? This link has many terms and I feel most of them conclude its not free for commercial apps - http://www.artifex.com/page/licensing-information.html Any idea ? — Hemant Tank, Sep 26 '16 at 13:49

score 2 · Answer 1 · answered Mar 19 '19 at 12:49

You can use ImageMagick too. And it's totally free! No trial or payment.

Just download the ImageMagick .exe from here. Install it and download the NuGet file in here.

There is the code! Hope I helped! (even though the question was made 6 years ago...)

Procedure:

     using ImageMagick;
     public void PDFToBMP(string output)
     {
        MagickReadSettings settings = new MagickReadSettings();
        // Settings the density to 500 dpi will create an image with a better quality
        settings.Density = new Density(500);

        string[] files= GetFiles();
        foreach (string file in files)
        {
            string fichwithout = Path.GetFileNameWithoutExtension(file);
            string path = Path.Combine(output, fichwithout);
            using (MagickImageCollection images = new MagickImageCollection())
            {
                images.Read(fich);
                foreach (MagickImage image in images)
                {
                    settings.Height = image.Height;
                    settings.Width = image.Width;
                    image.Format = MagickFormat.Bmp; //if you want to do other formats of image, just change the extension here! 
                    image.Write(path + ".bmp"); //and here!
                }
            }
        }
    }

Function GetFiles():

    public string[] GetFiles()
    {
        if (!Directory.Exists(@"your\path"))
        {
            Directory.CreateDirectory(@"your\path");
        }

        DirectoryInfo dirInfo = new DirectoryInfo(@"your\path");
        FileInfo[] fileInfos = dirInfo.GetFiles();
        ArrayList list = new ArrayList();
        foreach (FileInfo info in fileInfos)
        {
            if(info.Name != file)
            {
                // HACK: Just skip the protected samples file...
                if (info.Name.IndexOf("protected") == -1)
                    list.Add(info.FullName);
            }

        }
        return (string[])list.ToArray(typeof(string));
    }

I believe, it still needs GhostScript to be installed on a PC to be able to open PDF files : https://www.imagemagick.org/script/formats.php — VDN, Jan 06 '20 at 13:33
That code worked for my PDF files and I did not need the GhostScript... But it is a usefull software for this kind of work. — Sofia Rodrigues, Jan 12 '20 at 18:35
Thanks Sofia! Just two things that you might have missed, what is file in your second function and fich? Thanks again! — Federico Navarrete, Mar 06 '20 at 13:30

score 0 · Accepted Answer · answered Jul 09 '13 at 21:43

0

Found a CodeProject article on converting PDFs to Images:

http://www.codeproject.com/Articles/57100/Simple-and-Free-PDF-to-Image-Conversion

answered Jul 09 '13 at 21:43

Curtis Rutland

776
4
12

1

The Adobe Reader XI EULA states "3.2 Server Use. This agreement does not permit you to install or Use the Software on a computer file server." – OnceUponATimeInTheWest Jul 12 '13 at 13:38
According to the comments by the author on that tutorial, apparently he was using Acrobat. – Curtis Rutland Jul 12 '13 at 20:16
This isn't free. It require Acrobat Professional to be installed. Which is not free. – Jayanga Kaushalya Aug 06 '13 at 11:05
According to the article itself: "You must have "Adobe Acrobat Reader" installed on your system". Reader is the free version. If the author was wrong about what was required, you should make a comment on that article – Curtis Rutland Aug 06 '13 at 18:38

score 0 · Answer 3 · answered Sep 19 '22 at 08:27

I recognize this is a very old question, but it is an ongoing problem. If you are targeting .NET 6 or later, I hope you would take a look at my library Melville.PDF.

Melville.Pdf is a MIT-Licensed C# implementation of a PDF renderer. I hope this serves a need that I have felt for some time.

If you are trying to get text out of PDF documents, render + OCR may be the hard way arround. Some PDF files are just a thin wrapper around image objects, but many actually have text inside of them. Melville.PDF does not do text extraction (yet) but it might be an easier way to get text out of some files.

c# PDF to Bmp for free

3 Answers3

Linked