Parsing PDF pages as javascript Images

Question

As per title, is there any way I can parse pages from an unprotected PDF file as javascript Image() objects?

It would also be ok to convert them before running the javascript, but I would like this to be done automatically and without the assistance of any library which requires installation.

Someone over the internet has posted this Bash script. Unfortunately, I don't know Bash but running it was very simple.

#!/bin/bash
PDF='doc.pdf'
NUMPAGES=`identify -format %n "$PDF"`

for (( IDX=0; IDX<$NUMPAGES; IDX++ ))
do
  PAGE=$(($IDX+1))
  convert -resize 1200x900 "$PDF[$IDX]" `echo "$PDF" | sed "s/\.pdf$/-page$PAGE.jpg/"`
done

echo "Done"

But I got these errors:

line 3: identify: command not found
line 5: ((: IDX<: syntax error: operand expected (error token is "<")

Pre-converting the PDF using a Bash script would be a good solution. Can someone fix the script above or either provide an alternative solution?

Many thanks in advance!

why not using python and 1 of the gozillion libraries that you can use for free ? — Ken, Oct 16 '12 at 18:32

score 34 · Accepted Answer · answered Oct 16 '12 at 18:38

34

PDF.js will let you render the PDF to a canvas. Then you can do something like:

var img = new Image();
img.src = pdfCanvas.toDataURL();

I've been very impressed with PDF.js. I love letting the client's browser do as much of the work for me as possible.

Demo here: http://jsbin.com/pdfjs-helloworld-v2/1/edit

answered Oct 16 '12 at 18:38

Trevor Dixon

23,216
12
72
109

1

Here is a gist with working code: https://gist.github.com/ichord/9808444 – Anfuca Mar 13 '17 at 08:18
1

@MPV Not working anymore. Would you have another link? – Basj Oct 30 '18 at 13:51
@Basj it did load but slowly. But it works. If errors, let me know what you see. – MPV Oct 31 '18 at 14:10

score 1 · Answer 2 · answered Oct 16 '12 at 18:27

1

Looks like the first issue is a missing executable: identify. This is part of ImageMagick:

http://www.imagemagick.org/script/index.php

Make sure you also have it in your path.

answered Oct 16 '12 at 18:27

Abdullah Jibaly

53,220
42
124
197

so it uses imagemagick! Very bad, I didn't wanted to use it but it looks like it is the only solution. Thx! – Saturnix Oct 16 '12 at 18:31
@Saturnix `convert` is also part of the imagemagick libraries, this script is entirely based on imagemagick . – Ken Oct 16 '12 at 18:36

Parsing PDF pages as javascript Images

2 Answers2

Linked