Does pdf.js only work with certain pdfs?

Question

I am trying to convert pdf pages to canvas using pdf.js. I used the answer in Render .pdf to single Canvas using pdf.js and ImageData by K3N to achieve this. The code is avaliable here Fiddle1.

The problem is that this seems to work with certain pdfs only.

For example the code works fine for http://arxiv.org/pdf/1207.0102v2.pdf in Fiddle2.

However, when I tried the same code for http://infolab.stanford.edu/pub/papers/google.pdf in Fiddle3 it failed to work.

Why is this happening and can it be fixed?

score 5 · Accepted Answer · edited May 23 '17 at 10:26

5

It is supposed to work with all pdf files, unless they are corrupted. The error you have here is:

XMLHttpRequest cannot load http://infolab.stanford.edu/pub/papers/google.pdf. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://fiddle.jshell.net' is therefore not allowed access.

In other words, you can't load it this way because http://infolab.stanford.edu/pub/papers/google.pdf doesn't let you do to so. When you will have it on your server and load it with proper response headers, it will most probably work.

For more information about this error, refer to Why am I seeing an "origin is not allowed by Access-Control-Allow-Origin" error here?

If you don't host these files, you can pipe them through a proxy (which can be a third-party app or your server). For example, Ivan Žužak, developed urlreq–a tool which does exactly what we need in this situation.

Instead of using the direct link to pdf file, use Ivan's proxy url:

http://urlreq.appspot.com/req?method=GET&url=http%3A%2F%2Finfolab.stanford.edu%2Fpub%2Fpapers%2Fgoogle.pdf

JSFIDDLE

edited May 23 '17 at 10:26

Community

1
1

answered Oct 02 '15 at 14:54

Ionică Bizău

109,027
88
289
474

Is there a way to download the file temporally to my server before using pdf.js then delete it? – user3741635 Oct 02 '15 at 15:00
@user3741635 Of course, not an actual download but piping from network to the response. That could be done using a proxy. – Ionică Bizău Oct 02 '15 at 15:47
One drawback of using `urlreq` is that the browser doesn't seem to cache the file and every time you refresh the pdf file has to load again. – user3741635 Oct 02 '15 at 16:15
@user3741635 Well, `urlreq` is just an example. You can build such a proxy by yourself with caching enabled. Don't forget to mark by clicking the `✔` button. Thanks. – Ionică Bizău Oct 03 '15 at 16:10
Thanks for reply. Do you know any refs where they explain how to build such proxy. – user3741635 Oct 03 '15 at 16:58
@user3741635 What is your server side language? If it's Node.js, you could take a look at [`wrabbit`](https://github.com/jillix/wrabbit)–written by me, it adds some wrapping code, but it's the proxy principle is there: you provide an url, on the server side you make a request and stream it to the client. On the other side, you may want to open a feature request in Ivan's repository. :) – Ionică Bizău Oct 03 '15 at 17:12

Does pdf.js only work with certain pdfs?

1 Answers1

Linked