27

I would like to know if there is a way to select a pdf file using input type="file" and open it using PDFJS

Jorge Y. C. Rodriguez
  • 3,394
  • 5
  • 38
  • 61
user2422940
  • 917
  • 3
  • 12
  • 23
  • @Basj so basically you just would want to see an answer as seen below but mutated with the code from the answer from [this GitHub issue](https://github.com/mozilla/pdf.js/issues/11960)? Or is that a little too reductive? – Chiel Jun 03 '20 at 12:12
  • @Chiel Yes, in the meantime, [this comment](https://github.com/mozilla/pdf.js/issues/11960#issuecomment-637744616) showed adding `.promise.then(...)` solves it! – Basj Jun 03 '20 at 12:15
  • @Basj if I'm correct, answering this question shouldn't be too hard now then, right? – Chiel Jun 03 '20 at 12:17
  • Right, I didn't know that at the time of starting the bounty ;) – Basj Jun 03 '20 at 12:18

3 Answers3

55

You should be able to use a FileReader to get the contents of a file object as a typed array, which pdfjs accepts (https://mozilla.github.io/pdf.js/examples/)

//Step 1: Get the file from the input element                
inputElement.onchange = function(event) {

    var file = event.target.files[0];

    //Step 2: Read the file using file reader
    var fileReader = new FileReader();  

    fileReader.onload = function() {

        //Step 4:turn array buffer into typed array
        var typedarray = new Uint8Array(this.result);

        //Step 5:pdfjs should be able to read this
        const loadingTask = pdfjsLib.getDocument(typedarray);
        loadingTask.promise.then(pdf => {
            // The document is loaded here...
        });
                    

    };
    //Step 3:Read the file as ArrayBuffer
    fileReader.readAsArrayBuffer(file);
 
 }

Edit: The pdfjs API changed at some point since I wrote this first answer in 2015. Updating to reflect the new API as of 2021(thanks to @Chiel) for the updated answer

Sam
  • 995
  • 9
  • 16
  • 1
    Thanks @sam. To complement this answer: See here for how to extract the text using pdf.js: http://stackoverflow.com/a/20522307/408286 – mota Nov 03 '15 at 13:50
  • Hello. Thanks for your response. Is it possible to not use input type file ? I hava a string url in java and h:inputHidden in xhtml. Thanks – Paladice Jun 11 '18 at 09:31
  • 1
    Seem to be working with pdfjsLib now. If we import the CDN version, the PDF.js will be available as `var pdfjs = pdfjsLib.getDocument(typed_array)` – rags2riches-prog Oct 05 '18 at 13:59
  • 1
    It worked for me. I just needed to us direclty PDFJS.getDocument(this.result) – Alessandro Gurgel May 01 '20 at 15:42
  • on rags2riches & Alessandro: I think it depends on how you loaded it: when you // Loaded via – Rustam A. Jul 24 '21 at 13:20
14

If getDocument().then is not a function:

I reckon I have managed to solve the new problem with the new API. As mentioned in this GitHub issue, the getDocument function now has an promise added to itself. In short, this:

PDFJS.getDocument(typedarray).then(function(pdf) {
    // The document is loaded here...
});

became this:

const loadingTask = pdfjsLib.getDocument(typedarray);
loadingTask.promise.then(pdf => {
    // The document is loaded here...
});

Adapting the older answer to the new api to comply to the bounty gives the following result:

//Step 1: Get the file from the input element                
inputElement.onchange = function(event) {

    //It is important that you use the file and not the filepath (The file path won't work because of security issues)
    var file = event.target.files[0];

    var fileReader = new FileReader();  

    fileReader.onload = function() {

        var typedarray = new Uint8Array(this.result);

        //replaced the old function with the new api
        const loadingTask = pdfjsLib.getDocument(typedarray);
            loadingTask.promise.then(pdf => {
                // The document is loaded here...
            });

    };
    //Step 3:Read the file as ArrayBuffer
    fileReader.readAsArrayBuffer(file);

 }

I have created an example below with the official releases of the source code below to show that it is working.

/*Offical release of the pdfjs worker*/
pdfjsLib.GlobalWorkerOptions.workerSrc = 'https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.5.207/pdf.worker.js';
document.getElementById('file').onchange = function(event) {
  var file = event.target.files[0];
  var fileReader = new FileReader();
  fileReader.onload = function() {
    var typedarray = new Uint8Array(this.result);
    console.log(typedarray);
    const loadingTask = pdfjsLib.getDocument(typedarray);
    loadingTask.promise.then(pdf => {
      // The document is loaded here...
      //This below is just for demonstration purposes showing that it works with the moderen api
      pdf.getPage(1).then(function(page) {
        console.log('Page loaded');

        var scale = 1.5;
        var viewport = page.getViewport({
          scale: scale
        });

        var canvas = document.getElementById('pdfCanvas');
        var context = canvas.getContext('2d');
        canvas.height = viewport.height;
        canvas.width = viewport.width;

        // Render PDF page into canvas context
        var renderContext = {
          canvasContext: context,
          viewport: viewport
        };
        var renderTask = page.render(renderContext);
        renderTask.promise.then(function() {
          console.log('Page rendered');
        });

      });
      //end of example code
    });

  }
  fileReader.readAsArrayBuffer(file);
}
<html>

  <head>
  <!-- The offical release-->
    <script src="https://cdnjs.cloudflare.com/ajax/libs/pdf.js/2.5.207/pdf.js"> </script>
  </head>

  <body>
    <input type="file" id="file">
    <h2>Rendered pdf:</h2>
    <canvas id="pdfCanvas" width="300" height="300"></canvas>

  </body>

</html>

Hope this helps! If not, please comment.

Note:

This might not work in jsFiddle.

Chiel
  • 1,324
  • 1
  • 11
  • 30
4

I adopted your code and it worked! Then I was browsing for more tips here and there, then I learned there is an even more convenient method.

You can get the URL of client-loaded file with

URL.createObjectURL()

It reduces nesting by one level and you don't need to read the file, convert it to array, etc.