8

I'd like to take a PDF file and convert it to images, each PDF page becoming a separate image.

"Convert a .doc or .pdf to an image and display a thumbnail in Ruby?" is a similar post, but it doesn't cover how to make separate images for each page.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
tybro0103
  • 48,327
  • 33
  • 144
  • 170

3 Answers3

58

Using RMagick itself, you can create images for different pages:

require 'RMagick'
pdf_file_name = "test.pdf"
im = Magick::Image.read(pdf_file_name)

The code above will give you an array arr[], which will have one entry for corresponding pages. Do this if you want to generate a JPEG image of the fifth page:

im[4].write(pdf_file_name + ".jpg")

But this will load the entire PDF, so it can be slow.

Alternatively, if you want to create an image of the fifth page and don't want to load the complete PDF file:

require 'RMagick'
pdf_file_name = "test.pdf[5]"
im = Magick::Image.read(pdf_file_name)
im[0].write(pdf_file_name + ".jpg")
the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Akash Agrawal
  • 4,711
  • 5
  • 28
  • 26
  • 2
    Thanks for tip on the index in the path! Works great, even though it's a dirty hack ;) – SciPhi Jun 17 '14 at 15:37
  • 7
    This is great, but I got stuck for a long time not knowing I needed to "brew install ghostscript" as well to have the reader return more than an empty array. If you're an OSX user it might not come standard for you as well. – Matthew Du Pont Sep 05 '15 at 17:13
  • If you run into issues installing RMagick on OSX, read this to get things installed properly. http://blog.paulopoiati.com/2013/01/28/installing-rmagick-in-mac-os-x-mountain-lion-with-homebrew/ – Nick N Jun 01 '16 at 22:52
  • Does not work for me. `im` is always an empty array in my case. Does it depend on the PDF? – Hendrik Jan 17 '17 at 19:50
  • 1
    Thanks @MatthewDuPont. I was getting empty array until I installed ghostscript. – Simmi Badhan Mar 31 '17 at 03:04
22

ImageMagick can do that with PDFs. Presumably RMagick can do it too, but I'm not familiar with it.

The code from the post you linked to:

require 'RMagick'
pdf = Magick::ImageList.new("doc.pdf")

pdf is an ImageList object, which according to the documentation delegates many of its methods to Array. You should be able to iterate over pdf and call write to write the individual images to files.

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
Edward Dale
  • 29,597
  • 13
  • 90
  • 129
2

Since I can't find a way to deal with PDFs on a per-page basis in RMagick, I'd recommend first splitting the PDF into pages with pdftk's burst command, then dealing with the individual pages in RMagick. This is probably less performant than an all-in-one solution, but unfortunately no all-in-one solution presents itself.

There's also PDF::Toolkit for Ruby that hooks into pdftk but I've never used it.

Jordan Running
  • 102,619
  • 17
  • 182
  • 182