0

The need is to search the content of msword/pdf files for a text phrase and return the matching documents. I have over 10000 documents. Which one is it faster to read all documents to search the text phrase and return? Is it storing the contents of word/pdf document in MySQL table or Text file?

How best is to store a msword/pdf document content in MySQL Database? What data type is the best to use in MySQL table?

Swarne27
  • 5,521
  • 7
  • 26
  • 41

1 Answers1

0

I would keep everything in word or pdf but instead of searching the document with PHP I would make a python script to search through the documents, call it from PHP and return matching documents to PHP. Python is much faster for such things.

If you would have document contents in the database, mysql search would be fast too, but you have some limitations with content length (here is some info about the limitations) and also you have to read all the documents and save them to the database. I think you save a lot of time to just make a python script to search through them.

EDIT

Here are some performance tests (2016). If you are using PHP 7 it is actually the fastest. https://blog.famzah.net/2016/02/09/cpp-vs-python-vs-perl-vs-php-performance-benchmark-2016/

Also check this article => "Python is further considered to be the best programming language for developing scientific applications and applications that are required to process a huge amount of data."

Community
  • 1
  • 1
Silko
  • 584
  • 1
  • 8
  • 26
  • why is python faster than php? – Swarne27 Oct 12 '16 at 23:32
  • Check my edit. I gave you more info. Python is used in ai, machine learning and is the best for processing huge amount of data. I don't know about PHP7 but it looks like they made it much better. – Silko Oct 14 '16 at 20:41
  • I do not wish to change the coding language in-order to do this – Swarne27 Oct 15 '16 at 07:15