4

I have a pdf that contains a vector image. I asked the client about it, and they said that they created the image in Illustrator and saved it as a pdf. Is there a way I can extract that image and convert it into a png? I've tried code from the following:

Extract image from PDF using itextsharp

http://www.vbforums.com/showthread.php?530736-2005-Extract-Images-from-a-PDF-file-using-iTextSharp

and a couple of other links that I can't find, but they all don't seem to work. My theory is that they are extracting embedded images like jpegs, bmps, pngs, etc., but what I am faced with is a direct export from illustrator.

Should I be using an illustrator sdk or is there a way for me to do it using itextsharp? Also, I need to convert it to a standard image format, like png, and send the stream to a calling app, so I'll need to be able to grab stream.

Community
  • 1
  • 1
JohnathanKong
  • 1,307
  • 3
  • 21
  • 36
  • This sounds like a single-time use case. Since your client created the image and supplied it to you, would you be able to request that they simply supply it in a PNG format? Or you could open the PDF, size it on screen how you want, and do a screen capture. – mbmcavoy May 17 '13 at 18:57
  • Unfortunately the client will be supplying ALL their images like this. The reason why is because the site will spit out different size images based on the image size request, or if the user wants, they can download the vector version of it. – JohnathanKong May 17 '13 at 19:22
  • 1
    OK, so there will be a significant number of images to be processed this way? Still, this seems like PDF is a poor choice of format. Perhaps SVG? As this is an open format with wide support, I'm sure you can programmatically convert to PNG or PDF on demand. – mbmcavoy May 17 '13 at 22:19
  • 1
    Why can't you simply ask the client to supply you with the illustrator files instead of pdf versions? I find it hard to believe they wouldn't have the illustrator files themselves... – Lasse V. Karlsen May 18 '13 at 17:02
  • Unfortunately the previous project manager has told them to use this format, and they've become accustomed to it. As we all know, clients can be quite demanding, especially when they don't understand technology. What also has us stuck is that their current system does all this so they always say, "if the current system can do it, why can't yours?". – JohnathanKong May 21 '13 at 01:54

2 Answers2

0

You will not be able to do this with iText, since it cannot render or rasterize vector graphics in PDF files.

Option 1:
If a GPL license works for you, you could rasterize your PDF file with Imagemagick+GNU Ghostscript, but AFAIK you will have to write the output into a file in this case.

Command line sample:

convert -density 300 -depth 8 c:\temp\mydoc.pdf c:\temp\myrasterimage.png

There is also a .net wrapper in Codeplex that might work for you: ImageMagick.NET

Option A:
If a commercial library is an option for you, you could try with Amyuni PDF Creator .Net. You can either use the method IacDocument.ExportToJpg, which requires writing into a file, or you can use the method IacDocument.DrawCurrentPage, which can be useful for writing the output into a memory stream.

Sample code for exporting one page using IacDocument.DrawCurrentPage into a memory stream:

const int twipsPerInch = 1440;
const int MM_ISOTROPIC = 7;
private static MemoryStream RasterizePDF(string filePath, int pageIndex, int targetDPI)
{
    Amyuni.PDFCreator.IacDocument doc = new Amyuni.PDFCreator.IacDocument();
    doc.SetLicenseKey("Evaluation", "07EFC00...77C23E29");
    FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read);     
    doc.Open(fs, "");
    //Get the width and height of the target page
    Amyuni.PDFCreator.IacPageFormat format = doc.GetPage(pageIndex).GetPageFormat();
    doc.CurrentPageNumber = pageIndex;

    //Create Image
    Bitmap img = new Bitmap((int)(format.Width * targetDPI / twipsPerInch), (int)(format.Length * targetDPI / twipsPerInch), PixelFormat.Format32bppArgb);
    Graphics g = Graphics.FromImage(img);
    //set image object background to white
    g.Clear(Color.White);
    //Get a device context for the grahics object
    IntPtr hdc = g.GetHdc();
    SetMapMode(hdc, MM_ISOTROPIC);
    // set scaling factor
    SetWindowExtEx(hdc, twipsPerInch, twipsPerInch, 0);
    SetViewportExtEx(hdc, targetDPI, targetDPI, 0);
    //draw the contents of the PDF document on to the graphic context
    doc.DrawCurrentPage(hdc, false);
    //clean up
    g.ReleaseHdc(hdc);
    g.Dispose();
    // Save the bitmap as png into the resulting stream
    MemoryStream resultStrm = new MemoryStream();
    img.Save(resultStrm, ImageFormat.Png);
    //Prepare the stream to be read later on
    resultStrm.Position = 0;
}

[System.Runtime.InteropServices.DllImportAttribute("gdi32.dll")]
private static extern int SetMapMode(IntPtr hdc, int MapMode);
[System.Runtime.InteropServices.DllImportAttribute("gdi32.dll")]
private static extern int SetWindowExtEx(IntPtr hdc, int nXExtent, int nYExtent, int not_used);
[System.Runtime.InteropServices.DllImportAttribute("gdi32.dll")]
private static extern int SetViewportExtEx(IntPtr hdc, int nXExtent, int nYExtent, int not_used);

Disclaimer: I currently work as a developer of the library

yms
  • 10,361
  • 3
  • 38
  • 68
  • Image Magick seems to be the only free one out there and as much as I love Amyuni, it is currently out of our price range at the moment for such a small project. From the looks of imagemagicknet, it seems that they have stopped development on it since the last release was in 2009, which means that there will probably be no support for streams. My hosting doesn't have file storage, so I'm forced to use a cross between a php server and my .NET web services. This is an extremely poor way of doing things, but the alternative is to update the imagemgaick source code or get a real windows server. – JohnathanKong May 21 '13 at 19:03
  • ImageMagick relies on Ghostscript for PDF rasterization, and as far as I know there is no way in Ghostscript to get its output into a memory stream. I might be wrong, but if I am not, I guess there will be no way for you to achieve this (I mean getting the output in a memory stream) with ImageMagick. – yms May 21 '13 at 19:06
  • Also take into account that it might not be legal for you to use ImageMagick+Ghostscript for free in a commercial closed-source application. But I am not a lawyer, so [YMMV](http://en.wiktionary.org/wiki/your_mileage_may_vary). – yms May 21 '13 at 19:10
  • Thank you for the heads up. I'll look into the legality of this. – JohnathanKong May 21 '13 at 19:28
0

Modern versions of AI uses PDF as an export format. It's an enhanced form of PDF containing important metadata for Illustrator but ultimately it is PDF.

Yes most PDF packages are aimed at extracting bitmaps as these come in atomic lumps. If your embedded image is vector then it's been dropped in in a format which most will not understand.

Illustrator may have used its own metadata to delimit the image. If this is the case then it will be difficult to extract. However it may have used a PDF analog like the Form XObject. If I was designing Illustrator I would probably do both.

So it probably is possible to extract though perhaps a little tricky. More is impossible to say without being able to see the document.

If you would like to mail your illustrator file to us at ABCpdf we will certainly see what we can suggest. :-)