19

As per title, is there any way I can parse pages from an unprotected PDF file as javascript Image() objects?

It would also be ok to convert them before running the javascript, but I would like this to be done automatically and without the assistance of any library which requires installation.

Someone over the internet has posted this Bash script. Unfortunately, I don't know Bash but running it was very simple.

#!/bin/bash
PDF='doc.pdf'
NUMPAGES=`identify -format %n "$PDF"`

for (( IDX=0; IDX<$NUMPAGES; IDX++ ))
do
  PAGE=$(($IDX+1))
  convert -resize 1200x900 "$PDF[$IDX]" `echo "$PDF" | sed "s/\.pdf$/-page$PAGE.jpg/"`
done

echo "Done"

But I got these errors:

line 3: identify: command not found
line 5: ((: IDX<: syntax error: operand expected (error token is "<")

Pre-converting the PDF using a Bash script would be a good solution. Can someone fix the script above or either provide an alternative solution?

Many thanks in advance!

Saturnix
  • 10,130
  • 17
  • 64
  • 120

2 Answers2

34

PDF.js will let you render the PDF to a canvas. Then you can do something like:

var img = new Image();
img.src = pdfCanvas.toDataURL();

I've been very impressed with PDF.js. I love letting the client's browser do as much of the work for me as possible.

Demo here: http://jsbin.com/pdfjs-helloworld-v2/1/edit

Trevor Dixon
  • 23,216
  • 12
  • 72
  • 109
1

Looks like the first issue is a missing executable: identify. This is part of ImageMagick:

http://www.imagemagick.org/script/index.php

Make sure you also have it in your path.

Abdullah Jibaly
  • 53,220
  • 42
  • 124
  • 197
  • so it uses imagemagick! Very bad, I didn't wanted to use it but it looks like it is the only solution. Thx! – Saturnix Oct 16 '12 at 18:31
  • @Saturnix `convert` is also part of the imagemagick libraries, this script is entirely based on imagemagick . – Ken Oct 16 '12 at 18:36