2

So I am wondering if there's a way to extract text from pdf in javascript? I have already surveyed some npm modules like PDF-TO-TEXT but they all take in a file path name as input. I am using the react-drop-to-upload module to allow the user to drop the pdf to a react component. The react component takes in the pdf file and returns a File object rather than a file path. Is there a way to convert PDF stored in an File object to text? Thanks!

Rocking chief
  • 1,039
  • 3
  • 17
  • 31
  • Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, [describe the problem](http://meta.stackexchange.com/q/139399/) and what has been done so far to solve it. - From [help/on-topic] – Luca Kiebel Apr 10 '18 at 22:31
  • @Luca Thanks for the suggestion! – Rocking chief Apr 10 '18 at 22:33
  • 1
    just write the File Object instance to a tmp file, and feed that to PDF-TO-TEXT – chiliNUT Apr 10 '18 at 22:33
  • 1
    Does this answer your question? [extract text from pdf in Javascript](https://stackoverflow.com/questions/1554280/extract-text-from-pdf-in-javascript) – John Goofy Jan 10 '21 at 13:05

1 Answers1

1

PDF.js allows you to load file objects and then parse the document as a text. This example from the official website does exactly that.

tejzpr
  • 945
  • 8
  • 19
  • While this seems to be the only working solution at the moment, be warned that the prebuilt distributions of PDF.js may add ~ 870 kb to the bundle size of the importing application. – John Goofy Jan 10 '21 at 12:52