4

I'm using Ghostscript.NET, a handy C# wrapper for Ghostscript functionality. I have a batch of PDFs being sent from the clientside to be converted to images on the ASP .NET WebAPI server and returned to the client.

public static IEnumerable<Image> PdfToImagesGhostscript(byte[] binaryPdfData, int dpi)
{
    List<Image> pagesAsImages = new List<Image>();

    GhostscriptVersionInfo gvi = new GhostscriptVersionInfo(AppDomain.CurrentDomain.BaseDirectory + @"\bin\gsdll32.dll");

    using (var pdfDataStream = new MemoryStream(binaryPdfData))
    using (var rasterizer = new Ghostscript.NET.Rasterizer.GhostscriptRasterizer())
    {
        rasterizer.Open(pdfDataStream, gvi, true);

        for (int i = 1; i <= rasterizer.PageCount; i++)
        {
            Image pageAsImage = rasterizer.GetPage(dpi, dpi, i); // Out of Memory Exception on this line
            pagesAsImages.Add(pageAsImage);
        }
    }
    return pagesAsImages;
}

This generally works fine (I generally use 500 dpi, which I know is high, but even dropping to 300 I can reproduce this error). But if I give it many PDFs from the clientside (150 1-page PDFs, for example) it will often hit an Out of Memory Exception in Ghostscript.NET Rasterizer. How can I overcome this? Should this be threaded? If so how would that work? Would it help to use the 64 bit version of GhostScript? Thanks in advance.

HABJAN
  • 9,212
  • 3
  • 35
  • 59
Scotty H
  • 6,432
  • 6
  • 41
  • 94
  • Can you call `Dispose` on `pageAsImage` after the `Add`? No, threads won't help with this memory problem. Yes, running in a 64-bit process will probably help. If none of those help, then explicitly calling `GC.Collect()` might be necessary (though that's really a bad hack). – Chris O Nov 03 '15 at 21:34
  • @ChrisO Thanks but disposing it makes it inaccessible in the returned object. When trying to use the 64 bit dll I get this error "You are using native Ghostscript library (gsdll64.dll) compiled for 64bit systems in a 32bit process. You need to use gsdll32.dll." Any idea why it's a 32 bit process? I'm running the ASP .NET WebAPI via debug in Visual Studio 2015. – Scotty H Nov 03 '15 at 21:54
  • 3
    `Tools | Options | Projects and Solutions | Web Projects | Use the 64 bit version of IIS Express` but I haven't tried this myself with VS2015. – Chris O Nov 03 '15 at 22:42
  • 1
    Since switching to 64 bit, I have not encountered an Out of Memory Exception. – Scotty H Nov 04 '15 at 16:07
  • Awesome, glad to hear it. – Chris O Nov 04 '15 at 16:15

2 Answers2

0

I'm new to this myself, on here looking for techniques.

According to the example in the documentation here, they show this:

for (int page = 1; page <= _rasterizer.PageCount; page++)
{
    var docName = String.Format("Page-{0}.pdf", page);
    var pageFilePath = Path.Combine(outputPath, docName);
    var pdf = _rasterizer.GetPage(desired_x_dpi, desired_y_dpi, pageNumber);
    pdf.Save(pageFilePath);
    pagesAsImages.Add(pdf);
}

It looks like you aren't saving your files.

I am still working at getting something similar to this to work on my end as well. Currently, I have 2 methods that I'm going to try, using the GhostscriptProcessor first:

private static void GhostscriptNetProcess(String fileName, String outputPath)
{
    var version = Ghostscript.NET.GhostscriptVersionInfo.GetLastInstalledVersion();
    var source = (fileName.IndexOf(' ') == -1) ? fileName : String.Format("\"{0}\"", fileName);
    var gsArgs = new List<String>();
    gsArgs.Add("-q");
    gsArgs.Add("-dNOPAUSE");
    gsArgs.Add("-dNOPROMPT");
    gsArgs.Add("-sDEVICE=pdfwrite");
    gsArgs.Add(String.Format(@"-sOutputFile={0}", outputPath));
    gsArgs.Add(source);
    var processor = new Ghostscript.NET.Processor.GhostscriptProcessor(version, false);
    processor.Process(gsArgs.ToArray());
}

This version below is similar to yours, and what I started out using until I started finding other code examples:

private static void GhostscriptNetRaster(String fileName, String outputPath)
{
    var version = Ghostscript.NET.GhostscriptVersionInfo.GetLastInstalledVersion();
    using (var rasterizer = new Ghostscript.NET.Rasterizer.GhostscriptRasterizer())
    {
        rasterizer.Open(File.Open(fileName, FileMode.Open, FileAccess.Read), version, false);
        for (int page = 0; page < rasterizer.PageCount; page++)
        {
            var img = rasterizer.GetPage(96, 96, page);
            img.Save(outputPath);
        }
    }
}

Does that get you anywhere?

  • I am trying to get specific page from pdf document, for first time its ok but when I try to get convert another page from that pdf doc `out of memory` error stop me. It has been two days I am trying to solve it almost I tried all solutions but no luck yet. Any help would be so much appreciate. – metmirr Aug 22 '17 at 15:51
  • @metmirr - That was a contract job, and I no longer have the code for it. The best code segment I have that worked is posted here: https://stackoverflow.com/a/34770558/153923 –  Aug 22 '17 at 16:18
  • If you have a 32-bit PC or your program is using 32-bit plugins, the maximum size you are going to be able to access is 2GB file size. –  Aug 22 '17 at 16:21
0

You don't have to rasterize all pages at the same GhostscriptRasterizer instance. Use disposable rasterizer on each page and collect results in List Image or List byte[] . Example with results List of Jpeg encoded byte arrays.

List<byte[]> result = new List<byte[]>();

for (int i = 1; i <= pdfPagesCount; i++)
{
    using (var pageRasterizer = new GhostscriptRasterizer())
    {
        pageRasterizer.Open(stream, gsVersion, true);

        using (Image tempImage = pageRasterizer.GetPage(dpiX, dpiY, i))
        {
            var encoder = ImageCodecInfo.GetImageEncoders().First(c => c.FormatID == System.Drawing.Imaging.ImageFormat.Jpeg.Guid);
            var encoderParams = new EncoderParameters() { Param = new[] { new EncoderParameter(System.Drawing.Imaging.Encoder.Quality, 95L) } };

            using (MemoryStream memoryStream = new MemoryStream())
            {
                tempImage.Save(memoryStream, encoder, encoderParams);
                result.Add(memoryStream.ToArray());
            }
        }
    }
}

If you don't know number of pages in PDF you could call rasterizer one time, and get PageCount property.

4timepi
  • 1
  • 2