So I am wondering if there's a way to extract text from pdf in javascript? I have already surveyed some npm modules like PDF-TO-TEXT but they all take in a file path name as input. I am using the react-drop-to-upload module to allow the user to drop the pdf to a react component. The react component takes in the pdf file and returns a File object rather than a file path. Is there a way to convert PDF stored in an File object to text? Thanks!
Asked
Active
Viewed 1.5k times
2
-
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, [describe the problem](http://meta.stackexchange.com/q/139399/) and what has been done so far to solve it. - From [help/on-topic] – Luca Kiebel Apr 10 '18 at 22:31
-
@Luca Thanks for the suggestion! – Rocking chief Apr 10 '18 at 22:33
-
1just write the File Object instance to a tmp file, and feed that to PDF-TO-TEXT – chiliNUT Apr 10 '18 at 22:33
-
1Does this answer your question? [extract text from pdf in Javascript](https://stackoverflow.com/questions/1554280/extract-text-from-pdf-in-javascript) – John Goofy Jan 10 '21 at 13:05
1 Answers
1
PDF.js allows you to load file objects and then parse the document as a text. This example from the official website does exactly that.

tejzpr
- 945
- 8
- 19
-
While this seems to be the only working solution at the moment, be warned that the prebuilt distributions of PDF.js may add ~ 870 kb to the bundle size of the importing application. – John Goofy Jan 10 '21 at 12:52