Pass PDF document to user-agent via XHR, using modern (HTML5) methods, without compromising document encoding

Question

I am trying to accomplish exactly what is described at Handle file download from ajax post , but with a dynamically-generated PDF file, which I'm generating with PHP v5.6.10 (using the third-party PDFLib extension, v9.0.5).

Yet, when the PDF file is downloaded, it is corrupted; no PDF-reading implementation that I've tried is able to read the file, and every observation points to the fact that the file content is being butchered somewhere between printing the PDF content to the response body and saving the file via the user-agent (web-browser) with JavaScript.

I happen to be using jQuery v2.1.4, but I'm not sure that it matters, ultimately.

Important Provisos

I should mention that, like the other asker (cited above), I have an HTML form that users fill-out and submit via the POST verb. The form submission is performed with JavaScript, because there are actually 5 forms displayed in a tabbed layout that are submitted simultaneously, and any validation errors must be sent back via AJAX and displayed without refreshing the entire page). I mention this to make clear the fact that this is a POST request, which may return either a) a JSON object (that contains validation error strings, primarily), or b) a string that represents a PDF document, which should be presented to the user-agent as a file download.

My Code

The JavaScript

$('#submit-button').click(function() {
    $.ajax({
        url: $(this).data('action'),
        type: 'post',
        data: $($(this).data('forms')).serialize(),
        processData: false,
        statusCode: {
            500: function() {
                alert('An internal server error occurred. Go pound sand.');
            }
        }
    }).done(function(data, status, xhr) {
        processResponse(data, status, xhr);
    }).fail(function(jqXHR, textStatus) {
        if (textStatus === 'timeout') {
            alert('The request timed-out. Please try again.');
        }
    });
});

function processResponse(response, status, xhr)
{
    if (response !== null && typeof response === 'object') {
        //The server will return either a JSON string (if the input was invalid)
        //or the PDF file. We land here in the former case.
    }
    else {
        //This doesn't change the behavior.
        xhr.responseType = 'blob';

        //This doesn't change the behavior, either.
        //xhr.overrideMimeType('text\/plain; charset=x-user-defined');

        //The remainder of this function taken verbatim from:
        //https://stackoverflow.com/a/23797348

        // check for a filename
        var filename = "";
        var disposition = xhr.getResponseHeader('Content-Disposition');
        if (disposition && disposition.indexOf('attachment') !== -1) {
            var filenameRegex = /filename[^;=\n]*=((['"]).*?\2|[^;\n]*)/;
            var matches = filenameRegex.exec(disposition);
            if (matches != null && matches[1]) filename = matches[1].replace(/['"]/g, '');
        }

        var type = xhr.getResponseHeader('Content-Type');

        //Is logged to console as "application/pdf".
        console.log(type);

        var blob = new Blob([response], { type: type });

        if (typeof window.navigator.msSaveBlob !== 'undefined') {
            // IE workaround for "HTML7007: One or more blob URLs were revoked by closing the blob for which they were created. These URLs will no longer resolve as the data backing the URL has been freed."
            window.navigator.msSaveBlob(blob, filename);
        } else {
            var URL = window.URL || window.webkitURL;
            var downloadUrl = URL.createObjectURL(blob);

            //Is logged to console as URL() (it's an object, not a string).
            console.log(URL);

            //Is logged to console as "blob:https://example.com/108eb066-645c-4859-a4d2-6f7a42f4f369"
            console.log(downloadUrl);

            //Is logged to console as "pdftest.pdf".
            console.log(filename);

            if (filename) {
                // use HTML5 a[download] attribute to specify filename
                var a = document.createElement("a");
                // safari doesn't support this yet
                if (typeof a.download === 'undefined') {
                    window.location = downloadUrl;
                } else {
                    a.href = downloadUrl;
                    a.download = filename;
                    document.body.appendChild(a);
                    a.click();
                }
            } else {
                window.location = downloadUrl;
            }

            setTimeout(function () { URL.revokeObjectURL(downloadUrl); }, 100); // cleanup
        }
    }
}

The PHP

<?php

use File;
use \PDFLib;

class Pdf {

protected $p;
protected $bufferedContent;

public function __construct()
{
    $this->p = new PDFlib();

    $this->p->set_option('errorpolicy=return');
    $this->p->set_option('textformat=utf8');
    $this->p->set_option('escapesequence=true');
}

//...

public function sendToBrowser()
{
    $this->bufferedContent = $this->p->get_buffer();

    header_remove();

    header('Content-Type: application/pdf');
    header('Content-Length: ' . strlen($this->bufferedContent));
    header('Content-Disposition: attachment; filename=pdftest.pdf');

    $bytesWritten = File::put(realpath(__DIR__ . '/../../public/assets/pdfs') . '/' . uniqid() . '.pdf', $this->bufferedContent);

    echo $this->bufferedContent;
    exit;
}

//...

}

Notice that in the PHP method I am writing the PDF file to disk prior to sending it in the response body. I added this bit to determine whether the PDF file written to disk is corrupted, too, and it is not; it opens perfectly well in every reader I've tried.

Observations and Theories

What I find so strange about this is that I've tried the download in three different browsers (the most recent versions of Chrome, Firefox, and IE 11) and the PDF size is drastically different with each browser. Following are the file sizes from each:

Written to disk (not corrupted): 105KB
Chrome: 193KB
Firefox: 188KB
IE 11: 141KB

At this point, I am convinced that the problem relates to the encoding used within the PDF. I discovered a discrepancy when using WinMerge to compare the copy of the PDF that I dump directly to disk before returning the HTTP response with the copy that is handled via AJAX.

The first clue was this error message, which appears when I attempt to compare the two PDF documents:

I click OK to dismiss the error, and the comparison resumes.

The functional/correct PDF (at right, in WinMerge) is encoded using Windows-1252 (CP1252); I assume that that encoding happens within PDFLib (despite running on a GNU/Linux system). One can see from the PHP snippet, above, that I am calling $this->p->set_option('textformat=utf8'); explicitly, but that seems to set the encoding for input text that is included in the PDF document (and not the document encoding).

Ultimately, I am left wondering if there is any means by which to get this PDF to be displayed correctly after download.

Change the PDF Encoding, Instead?

I wonder if there is a "good reason" for which PDFLib is using Windows-1252 encoding to generate the PDF document. Is there any chance that this is as simple as changing the encoding on the PDFLib side to match what jQuery's AJAX implementation requires (UTF-8)?

I've consulted the PDFLib manual for more information, and there is a section dedicated to this subject: 4.2 Unicode-capable Language Bindings. This section has two subsections: 4.2.1 Language Bindings with native Unicode Strings (PHP is not among them) and 4.2.2 Language Bindings with UTF-8 Support (PHP falls into this category). But everything discussed herein seems to pertain to the actual strings that are inserted into the PDF body, and not to the overall document encoding.

Then there is 4.4 Single-Byte (8-Bit) Encodings, with the following note:

Note The information in this section is unlikely to be required in Unicode workflows.

How does one employ a Unicode workflow in this context?

The manual is available at http://www.pdflib.com/fileadmin/pdflib/pdf/manuals/PDFlib-9.0.5-tutorial.pdf for anyone who feels it may be useful.

Approaches That I'd Prefer to Avoid

I really hesitate to get into the business of re-encoding the PDF in JavaScript, client-side, once it has been downloaded. If that is the only means by which to achieve this, I will go another direction.

Initially, my primary aim was to avoid an approach that could leave abandoned PDF files laying-around on the server, in some temporary directory (thereby necessitating a clean-up cron-job or similar), but that may be the only viable option.

If necessary, I will implement an interstitial step whereby I write the PDF file to disk (on the web-server), pass it to the client using some unsightly hidden-iframe hack, and then delete the file once the user-agent receives it. Of course, if the user-agent never finishes the download, the user closes the browser, etc., the file will be abandoned and I'll be left to clean it up by some other means (the idea of which I hate, on principle).

Any assistance with this is hugely appreciated.

Is each PDF file that is generated totally unique such that you wouldn't want to persist them for performance reasons anyway? — Mike Brant, Oct 22 '15 at 14:36
Yes, each PDF file that is generated is totally unique and has no further value once downloaded. — Ben Johnson, Oct 22 '15 at 15:33

score 0 · Answer 1 · edited May 23 '17 at 12:03

0

You tried this on iframe?

I have this same problem, but i resolve by iframe. ugly code but works for me.

solution with iframe

edited May 23 '17 at 12:03

Community

1
1

answered Oct 22 '15 at 14:36

RBoschini

496
5
16

Thanks for the suggestion, but the solution that you propose falls into the *Approaches That I'd Prefer to Avoid* section of my question. I am trying to avoid writing the PDF file to disk (server-side), which, unless I am missing something, this solution requires. – Ben Johnson Oct 22 '15 at 15:39