6

I am following their "Code Example" guide on their github. https://github.com/modesty/pdf2json#code-example

In the example that says "Parse a PDF then write a .txt file (which only contains textual content of the PDF)", I copied and pasted the exact implementation into my a local JavaScript file and called it but the output text file was completely blank.

'use strict';

let fs = require('fs');
let PDFParser = require("pdf2json");

let pdfParser = new PDFParser();

pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError) );
pdfParser.on("pdfParser_dataReady", pdfData => {
    fs.writeFile("./node_modules/pdf2json/test/F1040EZ.content.txt", pdfParser.getRawTextContent());
});

pdfParser.loadPDF("./node_modules/pdf2json/test/pdf/fd/form/F1040EZ.pdf");

Is it something that I am doing wrong? Or does this not work on their part? Also are there any alternatives to pdf to text converters for Nodejs without additional binaries installed?

ThePumpkinMaster
  • 2,181
  • 5
  • 22
  • 31
  • This is a vast topic. The [pdf](http://stackoverflow.com/tags/pdf/info) tag states "Extracting text from a PDF may not be possible without resorting to Optical Character Recognition (OCR). Letters can be encoded as font glyphs, line art, vector graphics, or raster images". More background and sample PDF's would be needed for anyone to advise further. – dwarring Jun 12 '16 at 22:43

1 Answers1

11

The frontpage documentation is a bit wrong! In order to make this work simply set to PDFParser parameters null and 1

This one works:

var fs = require("fs");

// https://github.com/modesty/pdf2json
var PDFParser = require("./node_modules/pdf2json/PDFParser");
var pdfParser = new PDFParser(this,1);

pdfParser.on("pdfParser_dataError", errData => console.error(errData.parserError));
pdfParser.on("pdfParser_dataReady", pdfData => {
    console.log(pdfParser)
    fs.writeFile("./content.txt", pdfParser.getRawTextContent());
});

HTH -XDVarpunen

Link to issue in pdf2json: https://github.com/modesty/pdf2json/issues/76

xdvarpunen
  • 326
  • 1
  • 10