11

Convert a .doc or .pdf to an image and display a thumbnail in Ruby?
Does anyone know how to generate document thumbnails in Ruby (or C, python...)

e-sushi
  • 13,786
  • 10
  • 38
  • 57

7 Answers7

22

A simple RMagick example to convert a PDF to a PNG would be:

require 'RMagick'
pdf = Magick::ImageList.new("doc.pdf")
thumb = pdf.scale(300, 300)
thumb.write "doc.png"

To convert a MS Word document, it won't be as easy. Your best option may be to first convert it to a PDF before generating the thumbnail. Your options for generating the PDF depend heavily on the OS you're running on. One might be to use OpenOffice and the Python Open Document Converter. There are also online conversion services you could try, including http://Zamzar.com.

tomafro
  • 5,788
  • 3
  • 26
  • 22
  • It works. But it is time consuming. Can I just read the first page of the pdf and get its image version? – aisensiy Aug 17 '13 at 10:26
4

Sample code to answer the comment by @aisensiy above :

require 'rmagick'
pdf_path = "/path/to/interesting/file.pdf"
page_index_path = pdf_path + "[0]" # first page in PDF
pdf_page = Magick::Image.read( page_index_path ).first # first item in Magick::ImageList
pdf_page.write( "/tmp/indexed-page.png" ) # implicit conversion based on file extension

Based on the path clue in answer to another question :

https://stackoverflow.com/a/6369524/765063

Community
  • 1
  • 1
SciPhi
  • 2,585
  • 1
  • 18
  • 19
0

If you don't mind paying for Imgix, it handles PDFs too. You get all the benefits of a fast CDN with it.

Jan Klimo
  • 4,643
  • 2
  • 36
  • 42
0

Not sure about .doc support in any open source library but ImageMagick (and the RMagick gem) can be compiled with pdf support (I think it's on by default)

Loren Segal
  • 3,251
  • 1
  • 28
  • 29
0

PDF support is a little buggy in ImageMagick - but it's by far the best OS way for ruby. There's also a google summer of code project for pure Ruby PDF support.

I've read stuff about using OpenOffice without the GUI to transform .doc files - but it'll be complicated at best.

0

As the 2 previous posters said, ImageMagick's probably the easiest way to generate the thumbnails.

You could exec something like:

´convert -size 300x300 doc.pdf doc.png´

(The backquotes tell Ruby to shell it out).

If you don't want to use exec to do the conversion you could use the RMagick gem to do it for you but it's probably a bit more of code.

Federico Builes
  • 4,939
  • 4
  • 34
  • 48
0

LibreOffice helped me to convert .doc, .docx or .rtf to images. Install LibreOffice on your server:

sudo apt install libreoffice-common
sudo apt install libreoffice-writer

Test it in your terminal:

soffice --draw --convert-to pdf some_file.doc && convert -density 288x288 -units pixelsperinch some_file.pdf -background white -alpha background -alpha off -quality 100 -resize 25% img_name.png

As you see, this will first convert file to PDF and then to images. You might need to edit ImageMagic policies:

# open ImageMagic policy
sudo nano /etc/ImageMagick-6/policy.xml
# and add this line
<policy domain="coder" rights="read|write" pattern="PDF" />

There is also another tool to convert PDF to images - pdftoppm.

pdftoppm some_file.pdf img_name -png
Mr_Getman
  • 1
  • 1