4

I'm using node.js and pdf2json parser to parse a pdf file. Currently it is working with a local pdf file. But I'm trying to get a pdf-file through the URL/HTTP Module of node.js and I want to open this file to parse it.

Is there any possibility to parse/work with an online pdf?

let query   = url.parse(req.url, true).query;
let pdfLink = query.pdf;
...
pdfParser.loadPDF(pdfLink + "");

So the url should be given through the url like: https://localhost:8080/?pdf=http://whale-cms.de/pdf.pdf

Is there any way to parse it within the online pdf/link?

Thanks in advance.

Daniel Wahl
  • 59
  • 1
  • 7

1 Answers1

3

Im just faced with the same problem, and found a solution:

        var request = require('request');
        var PDFParser = require("pdf2json");
        var pdfUrl = "http://localhost:3000/cdn/storage/PDFFiles/sk87bAfiXxPre428b/original/sk87bAfiXxPre428b"
        var pdfParser = new PDFParser();

        var pdfPipe = request({url: pdfUrl, encoding:null}).pipe(pdfParser);

        pdfPipe.on("pdfParser_dataError", err => console.error(err) );
        pdfPipe.on("pdfParser_dataReady", pdf => {
          let usedFieldsInTheDocument = pdfParser.getAllFieldsTypes();
            console.log(usedFieldsInTheDocument)
        });

Source from: https://github.com/modesty/pdf2json/issues/65
Cheers

peter
  • 345
  • 1
  • 2
  • 13