How to extract text from a pdf file in javascript?

Question

So I am wondering if there's a way to extract text from pdf in javascript? I have already surveyed some npm modules like PDF-TO-TEXT but they all take in a file path name as input. I am using the react-drop-to-upload module to allow the user to drop the pdf to a react component. The react component takes in the pdf file and returns a File object rather than a file path. Is there a way to convert PDF stored in an File object to text? Thanks!

Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, [describe the problem](http://meta.stackexchange.com/q/139399/) and what has been done so far to solve it. - From [help/on-topic] — Luca Kiebel, Apr 10 '18 at 22:31
just write the File Object instance to a tmp file, and feed that to PDF-TO-TEXT — chiliNUT, Apr 10 '18 at 22:33
Does this answer your question? [extract text from pdf in Javascript](https://stackoverflow.com/questions/1554280/extract-text-from-pdf-in-javascript) — John Goofy, Jan 10 '21 at 13:05

score 1 · Accepted Answer · answered Apr 10 '18 at 23:37

1

PDF.js allows you to load file objects and then parse the document as a text. This example from the official website does exactly that.

answered Apr 10 '18 at 23:37

tejzpr

945
8
19

While this seems to be the only working solution at the moment, be warned that the prebuilt distributions of PDF.js may add ~ 870 kb to the bundle size of the importing application. – John Goofy Jan 10 '21 at 12:52

How to extract text from a pdf file in javascript?

1 Answers1

Linked