0

I have read elsewhere (Reading PDF file using javascript) how to read the texts in a pdf file, and showing them in the console. This is done using the following code:

var PdfReader = require("pdfreader").PdfReader;
new PdfReader().parseFileItems("sample.pdf", function(err, item){
  if (item && item.text)
    console.log(item.text);
});

My question is, instead of showing the texts in the console using console.log, how do I store them in an array, for use at a later stage of the script?

agreppi
  • 1
  • 1

2 Answers2

0

Initialize an array above the parse function, then push the items to the array:

var PdfReader = require("pdfreader").PdfReader;
var arr = [];
new PdfReader().parseFileItems("sample.pdf", function(err, item){
  if (item && item.text){
    arr.push(item.text);
 }
});
console.log(arr);
Sean
  • 1,368
  • 2
  • 9
  • 21
  • I already tried that, and what´s happening is that the last command (console.log(arr)) is actually executed BEFORE the parseFileItems function finishes. So, what I got is just []. I guess I need to use some kind of promise, or async, but I don't know how. – agreppi Jun 09 '22 at 19:57
0
const { PdfReader } = require("pdfreader");
var arr = [];
new PdfReader().parseFileItems("test/sample.pdf", (err, item) => {
  if (err) console.error("error:", err);
  else if (!item) console.log(arr);
  else if (item.text) arr.push(item.text);
});

I modified code from official example: https://github.com/adrienjoly/npm-pdfreader#raw-pdf-reading

kirogasa
  • 627
  • 5
  • 19