1

My database has almost 15.000.000 entries. Each entry is a phrase and each phrase starts in upper case and ends in a point (.) ex: "This is a sample phrase."

I need to get 10 random phrases that contain a specific word ( $q ). Because $q can be in the beginning of the phrase i will use also $q_upper for the query.

Using this query:

$res = mysqli_query($db,"SELECT quotes FROM corpus WHERE quotes LIKE '% ".$q." %' or quotes LIKE '% ".$q.".' or quotes '".$q_upper." %' order BY RAND() LIMIT 10");

Because its a large database it takes around 10 sec and a lot of resources to deliver results.

Is there a way I can optimize this query (or maybe split the database ) in order to make it efficiency.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Pedro P
  • 373
  • 1
  • 4
  • 16
  • May be [this](https://dzone.com/articles/how-to-optimize-mysql-queries-for-speed-and-perfor) will help you. – Robin Singh Dec 22 '20 at 07:50
  • Thanks Robin Singh; will check it now. – Pedro P Dec 22 '20 at 07:54
  • 1
    See about sql injection and the importance of prepared and bound queries – Strawberry Dec 22 '20 at 08:03
  • And are you seriously suggesting that your query produces a different result than simply `SELECT quotes FROM corpus WHERE quotes LIKE '%$q%' order BY RAND() LIMIT 10`? – Strawberry Dec 22 '20 at 08:06
  • if i query that way and looking for "more" i can get "moreover" or "morel" ( i need to have spaces... somehow ) – Pedro P Dec 22 '20 at 08:15
  • Then it seems you're looking for something like [a word match](https://stackoverflow.com/questions/656951/search-for-whole-word-match-in-mysql)? – El_Vanja Dec 22 '20 at 08:28
  • Thanks El_Vanja, that might be it, will test now. – Pedro P Dec 22 '20 at 08:35
  • **Warning:** You are wide open to [SQL Injections](https://php.net/manual/en/security.database.sql-injection.php) and should use parameterized **prepared statements** instead of manually building your queries. They are provided by [PDO](https://php.net/manual/pdo.prepared-statements.php) or by [MySQLi](https://php.net/manual/mysqli.quickstart.prepared-statements.php). Never trust any kind of input! Even when your queries are executed only by trusted users, [you are still in risk of corrupting your data](http://bobby-tables.com/). [Escaping is not enough!](https://stackoverflow.com/q/5741187) – Dharman Dec 22 '20 at 13:34
  • Did you have a look at [Full-Text Search index](https://dev.mysql.com/doc/refman/8.0/en/fulltext-search.html)? It should be faster than LIKE query – Pavel Parshin Dec 25 '20 at 08:29
  • Aside from query optimalizations - Do you really need to select from 15m records? What about having separate table, with far less records (10k maybe less), prepopulated as per your criteria and then select from there instead. – Marcin Orlowski Dec 25 '20 at 09:22
  • You can try [regexp_like](https://dev.mysql.com/doc/refman/8.0/en/regexp.html) but if you have a lot of data - it will be still slow. You need to [get rid](https://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast) of `ORDER BY RAND()` and multiple `OR LIKE`. As already mentioned - `full-text Search Index` can help you, or use another tools for this task - `elasticsearch`, `solr` or `sphinx` – Eugene K Feb 18 '21 at 06:42

1 Answers1

0

Use a case-folding collation -- That gets rid of q_upper.

Use FULLTEXT instead of LIKE -- that gives a lot of speed.

Rick James
  • 135,179
  • 13
  • 127
  • 222