I have thousands of searchable PDFs, some of which are up to a 1GB with over 2000 pages. I need to be able to search for a text string in these files using a Node.js app.
Right now, files are stored in a Google Cloud Storage bucket.
What's the best way to do this?
Some options:
- Read the text from PDF files into MySQL using something like NPM
package
pdf-text-extract
. Then use MySQL queries to search for text strings. - Search the PDF files directly using some NPM package.
Am I completely off? Is there a better way?