0

I'm trying to programmatically supply the contents of an in-memory file to pdftotext, which is freely available: http://www.xpdfreader.com/about.html Others seem to have done that: Passing string stored in memory to pdftotext, antiword, catdoc, etc

I can programmatically handle the output:

const { spawn } = require( 'child_process' );
const fs = require( 'fs' );

const pdftotextExe = 'bin/bin64/pdftotext.exe';
const inputFile = 'simple.pdf';

const command = spawn( pdftotextExe, [inputFile, '-'], { stdio: ['pipe', 'pipe', 'pipe'] } );
command.stdout.on( 'data', chunk => console.log( `starting chars: ${chunk.toString( 'utf8' ).slice( 0, 5 )}` ) );
command.stdout.on( 'end', () => console.log( 'done run' ) );
command.stderr.on( 'data', ( err => console.log( `got error: >> ${err} <<` ) ) );

The above works. But trying to supply the input through stdin doesn't work. The similar code below produces "TypeError [ERR_STREAM_NULL_VALUES]: May not write null values to stream"

const command = spawn( pdftotextExe, ['-', '-'], { stdio: ['pipe', 'pipe', 'pipe'] } );
fs.readFile( inputFile, {}, ( contents ) => {
  command.stdin.write( contents ); // write file contents
  command.stdin.end(); // end input
  command.stdout.on( 'data', chunk => console.log( `starting chars: ${chunk.toString( 'utf8' ).slice( 0, 5 )}` ) );
  command.stdout.on( 'end', () => console.log( 'done run' ) );
  command.stderr.on( 'data', ( err => console.log( `got error: >> ${err} <<` ) ) );
} );

What's wrong with the above code?

Govdata1
  • 57
  • 1
  • 6
  • can use readStream instead of readFile? `const readStream = fs.createReadStream(inputfile);` then do, `readStream.pipe(command.stdin);`? – Aritra Chakraborty Jan 10 '20 at 20:28
  • I tried that. Essentially the same problem. The Xpdf docs don't mention '-' at an option for accepting input from the stdin. But here's another example of using it: https://kaijento.github.io/2017/03/27/pdf-scraping-gwinnetttaxcommissioner.publicaccessnow.com/ I'm wondering if there's a difference between the Xpdf pdftotext and the poppler-utils pdftotext. It's unfortunate that the fork retained the same name. – Govdata1 Jan 10 '20 at 21:05

0 Answers0