How to avoid downloading the entire PDF to display

Question

in my webpage you can read book in pdf format. The problem is that some books have around 1000 pages and the PDF is really big so even if the user reads just 10 pages the server download the full pdf, so this is awful for my hosting account because I have a transfer limit.

What could I do to display the pdf without load the full PDF.

I use pdf.js

Greetings.

Are you using the `PDFDoc.getPage(n)` function to dynamically load the pages? — Jonathan M, Sep 16 '14 at 14:27
PDF.js does this by default, try http://mozilla.github.io/pdf.js/web/viewer.html#disableAutoFetch=true — async5, Sep 17 '14 at 12:25

Myst · Accepted Answer · 2016-11-07T01:10:48.633

ORIGINAL POST:

PDF files are designed in a way that forces the client side to download the whole file just to get the first page.

The last line of the PDF file tells the PDF reader where the root dictionary for the PDF file is located (the root dictionary tells the reader about the page catalog - order of pages - and other data used by the reader).

So, as you can see, the limitations of the PDF design require that you use a server side solution that will create a new PDF with only the page(s) you want to display.

The best solution (in my opinion) is to create a "reader" page (as opposed to a download page) that requests a specific page from the server and allows the user to advance page by page (using AJAX).

The server will need to create a new PDF (file or stream) that contains only the requested page and return it to the reader.

if you are running your server with Ruby (ruby on rails), you can use the combine_pdf gem to load the pdf and send just one page...

You can define a controller method that will look something like this:

def get_page
    # read the book
    book = CombinePDF.parse IO.read("book.pdf")
    # create empty PDF
    pdf_with_one_page = CombinePDF.new
    # add the page you want
    # notice that the pages array is indexed from 0,
    # so an adjustment to user input is needed...
    pdf_with_one_page << book.pages[ params[:page_number] - 1  ]
    # no need to create a file, just stream the data to the client.
    send_data pdf_with_one_page.to_pdf, type: 'application/pdf', disposition: 'inline'
end

if you are running PHP or node.js, you will need to find a different server-side solution.

Good luck!

EDIT:

I was looking over the PDF.js project (which looks very nice) and notice the limited support statement for Safari: "Safari (desktop and mobile) lacks a number of features or has defects, e.g. in typed arrays or HTTP range requests"...

I understand from this statement that on some browsers you can manage a client-side solution based on the HTTP Byte Serving protocol.

This will NOT work with all browsers, but it will keep you from having to use a server-side solution.

I couldn't find the documentation for the PDF.js feature (maybe it defaults to ranges and you just need to set the range...?), but I would go with a server-side solution that I know to work on all browsers.

EDIT 2:

Ignore Edit 1, as iPDFdev pointed out (thank you iPDFdev), this requires a special layout of the PDF file and will not resolve the issue of the browser downloading the whole file.

You sure about your first statement? I've read large pdfs that allowed me to start on the first page and just showed blanks on later pages until they were downloaded. — Jonathan M, Sep 16 '14 at 14:38
I am sure about the PDF file specifications on the matter. It is the issue of the data known as xref in the file format. I don't know how they worked on starting the first page first (maybe they had direct file system access so they could read the last line first)... — Myst, Sep 16 '14 at 14:42
Yeah, but, for example, this pdf shows the first page before the rest are downloaded: http://www.nbb.be/DOC/BA/PDF7MB/2010/201009400082_1.PDF — Jonathan M, Sep 16 '14 at 14:43
I think that is actually a browser feature... it's called Byte Serving and is done over the HTTP layer. You could look at this post for something similar someone is trying out: http://stackoverflow.com/questions/17643851/downloading-a-portion-of-a-file-using-http-requests — Myst, Sep 16 '14 at 14:52
That PDF feature is called fast web view or linearization and it requires a special layout of the PDF file content. It does make use of Byte Serving for requesting specific parts of the PDF file. It allows Adobe Reader to perform an incremental download and display pages as soon as they are downloaded. The problem is the whole file is still downloaded in the background so if you want to minimize the traffic a server side solution, like the one above, is required. — iPDFdev, Sep 16 '14 at 15:14

score -1 · Answer 2 · answered Sep 16 '14 at 14:36

-1

You can take following approach governed by functionality

Add configuration (i.e. kind of flag) whether you want to display entire PDF or not.
While rendering your response read above mentioned configuration if flag is set generate minimal PDF with 20 pages with hyperlink to download entire PDF else minimal PDF with 20 pages only
When you prepare initial response of your web page add PDF which contains say 20 pages (minimal PDF) only and process the response

answered Sep 16 '14 at 14:36

Kinu

1
2

How to "add configuration"? Can you show an example? – Jonathan M Sep 16 '14 at 14:37
It's conceptual... you can add a flag in your response indicating whether entire PDF can be displayed/downloaded or not. This is server side logic which you need to build – Kinu Sep 16 '14 at 14:40
Do you know how to do this? – Jonathan M Sep 16 '14 at 14:42
Well Jonathan that depends on what environment you are developing, you seems quiet interested in this than Fylux ! – Kinu Sep 17 '14 at 05:15

How to avoid downloading the entire PDF to display

2 Answers2