0

I have a very large mysql database consisting of 35 tables. Each table is different in structure, but all have some common fields. 'entity_name' is one such common field in all the tables. It is a VARCHAR field. Some tables contains millions of records, while some contains billions.

I am searching for a search key in all these tables and pull the entity name, plus some of the common fields from the tables containing matching records; Push the results to an array and finally display in HTML. The queries are pretty straightforward:

SELECT * FROM table_name WHERE entity_name LIKE '%search_key%' //Partial match.

SELECT * FROM table_name WHERE entity_name LIKE 'search_key%' //Begins with.

SELECT * FROM table_name WHERE entity_name IN('search_key') //Equal to.

Most often the first query is used.

The size of the database is around 9GB. The search process is very slow. Per research I have found that using indexes might not working since I am searching with this: entity_name LIKE '%search_key% The wildcard char appears in the front. Partitions also do not work(please correct if I am wrong) since we are focusing on a varchar field.

Any ideas/ suggestions are welcome.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Biju P Dais
  • 131
  • 1
  • 12
  • 1
    (1) If your database is 9GB, then no table has "billions" of rows. (2) You may want to consider a full text index. – Gordon Linoff Jul 09 '18 at 11:55
  • An entity name as the main way into a database doesn't look like a good choice to me. I would create a table with an entity_id and entity_name where the name must be unique and then use the id in all the other tables. This won't solve your speed problem though, unless there is a much lower number of different names, repeated many times in each table. – KIKO Software Jul 09 '18 at 11:59
  • Another, old, technique to speed up searches is to limit what needs to be searched through. Often this is done on date. So you could perform the search only on data less than 2 years old, unless the user specifically asks to dive into the 'archived' results. – KIKO Software Jul 09 '18 at 12:11
  • @KIKO Software Cannot archive records, the search has to operate on the full dataset. – Biju P Dais Jul 09 '18 at 12:31

3 Answers3

3

You are correct when you say LIKE '%search_key' is inherently slow. That kind of search term is not sargable because it begins with the %. Indexes on the entity_name column may still help a bit especially if your tables have many columns: MySQL may be able to scan the index rather than the table. But it will never be fast.

You should consider using MySQL FULLTEXT searching for this application.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Thank you for your feedback, to my knowledge FULLTEXT search pulls records having individual matching words if I am correct? Is there a way to match the string as a whole? I believe we can't use wildcards in this? – Biju P Dais Jul 09 '18 at 12:27
  • "I believe we can't use wildcards in this? " MATCH clause (searching on FULL text indexes https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html) in boolean mode supports wildcards (capablities depens on used storage engine) and or boolean logic at the same time making it more usefull that the LIKE capablities @BijuPDais `MATCH(..) AGAINST ('+MySQL -YourSQL' IN BOOLEAN MODE);` would be written as `... LIKE('%MySQL %') AND ... NOT LIKE('%YourSQL%')` – Raymond Nijland Jul 09 '18 at 12:41
0

If your search_key is dynamic (searching in text) it would be better to use different kind of DB. Something like elastic for example or other similar.

If your search_key could be defined as ENUM take it out in a different column and use it instead.

Searching in text is not the mySQL way.

Lyubetrix
  • 51
  • 3
  • We are having our mysql db in the same host. So going for a new db might not work. Which other db do you suggest? – Biju P Dais Jul 09 '18 at 12:32
  • He said elastic which run on Java (which is resource hungry and slow) and doesn't support SQL so you need to learn a new "language".. @BijuPDais .. MongoDB would have been a better suggestion still you need to learn a new "language" but atleast isn't not running on Java. – Raymond Nijland Jul 09 '18 at 12:50
0

If you are showing your records in a web page then I believe you may not need the whole bunch of data in single go. The best way for such huge amount of data is to perform pagination.

You can limit your query on number of records, might be 10, 25 or 50 at a time, which may not take much time to execute your query. Your query can be -

SELECT * FROM table_name WHERE entity_name LIKE '%search_key%' limit x,y;

Here replace x with your lower limit and y with your upper limit.

In case you need the count of your matching records use below query just to calculate count.

SELECT count(*) FROM table_name WHERE entity_name LIKE '%search_key%';

Hope this helps!

DIGVJSS
  • 463
  • 1
  • 10
  • 21
  • This will not help. If you use pagination you have to use ORER BY. Doing this means that MySQL has to process the whole table before it can LIMIT the result. – KIKO Software Jul 09 '18 at 12:09
  • I am afraid I can't use pagination since I am displaying the results are obtained after querying all the tables. I use array_merge to combine the results as an array and iterate and display this array in the page. – Biju P Dais Jul 09 '18 at 12:29
  • Pagination might be unavoidable. What would happen if I searched on the entity_name 'E'? You would get so many results that the HTML output would run in the gigabytes. If your array doesn't run out of memory before that. – KIKO Software Jul 09 '18 at 12:37
  • Another way you can do is create separate table for search_key and reference it by ID, this will improve the speed to some extent.. Else, use NoSQL connection for storing this table data, this for sure will get your work done. – DIGVJSS Jul 09 '18 at 12:40