3

I have two tables in MySQL, table1 has 1,013,347 entities and 38 attributes, and table2 has 7,343,905 entities and 10 attributes. In the following query (which is supposed to get the number of rows for pagination), table1.ID is a PK, table2.ID is its FK (both are indexed), and the HAVING clause gets the value if it's over a certain percentage, in this case 50%

SELECT SQL_CALC_FOUND_ROWS * 
FROM table1 INNER JOIN table2 ON table1.ID = table2.ID 
WHERE table1.attribute1 LIKE 'D%' 
GROUP BY table2.ID 
HAVING (COUNT(table2.ID) * (100/18)) >= '50'

Even in the simplified state which I posted here, this query takes no less than 5 minutes to run via command line. I know there must be changes I can make to the query, the PHP code (the values '50' and 'D' are assigned via PHP variables), and/or to my MySQL configuration to speed things up (I'm using the latest XAMPP with default configurations). Any help would be greatly appreciated.

EDIT1: All attributes are TINYTEXT except for the ID attributes which are VARCHAR(9).

EDIT2: EXPLAIN SELECT... returns:

+----+-------------+--------+------+---------------+-------------+---------+------+---------+---------------------------------+
| id | select_type | table  | type | possible_keys | key         | key_len | ref  | rows    | Extra                           |
+----+-------------+--------+------+---------------+-------------+---------+------+---------+---------------------------------+
|  1 | SIMPLE      | table2 | ALL  | NULL          | NULL        | NULL    | NULL | 7343905 | Using temporary; Using filesort |
|  1 | SIMPLE      | table1 | ref  | ID            | ID          | 29      | func |       1 | Using where                     |
+----+-------------+--------+------+---------------+-------------+---------+------+---------+---------------------------------+
2 rows in set (0.00 sec)
  • 2
    What do you get when you `EXPLAIN SELECT ...`? –  Apr 21 '12 at 23:07
  • 1
    It won't affect the performance but don't quote numbers, `'50'`, it'll trip you up eventually. Plus obligatory Coding Horror ["Pagination is dead" reference](http://www.codinghorror.com/blog/2012/03/the-end-of-pagination.html). – Ben Apr 21 '12 at 23:09
  • 3
    Have you indexed table1.attribute1, which you're doing a `LIKE` against? – Marc Apr 21 '12 at 23:10
  • No I have not, I will do so now, thanks! – deathonater Apr 21 '12 at 23:14
  • @deathonater can you also post the indexes on this table. The reason for slow is because of the filesort option. I would also add to the observation made by Ben, if you quote numbers then MySQL needs to do an atoi conversion for comparison. I have seen this affecting performance in earlier versions not sure which version you're using though. – g13n Apr 21 '12 at 23:26
  • @Ben thanks for the suggestion and the interesting article. In my case, I will need endless pagination since the data returned is going to be scoured by clients and employees for data mining purposes, also table1 will be upwards of 6 million entities when the development phase is over. – deathonater Apr 21 '12 at 23:34
  • What percentage of records are you filtering out with the LIKE clause and how many with the HAVING clause? – Ami Apr 22 '12 at 02:05

2 Answers2

3

Here are some potential improvements:

  • You are using ID's of type VARCHAR(9) and you are perfoming joins using those fields. It might be a good idea to introduce integer surrogate keys instead of varchars to speed up the join. See this discussion.
  • LIKE operator is usually expensive. Consider your usage; like Marc suggested, you should index table1.attribute1.
  • Maybe you can speed up the query by omitting LIKE altogether: for example, instead of using 'D%', you can use RIGHT(), although I'm not sure if it will be much faster. If the data in the table changes infrequently, you can create a new, indexed column with the beggining of table1.attribute1 values pre-cut; it however depends what values are inserted after LIKE by the php script.
Community
  • 1
  • 1
Dominik
  • 673
  • 5
  • 7
1

few ideas to improve performance

  1. index table2.ID (must) and table1.ID too
  2. if possible make the id columns bigint and table1.attribute1 a varchar. Please take care defining a proper size for varchar columns based on the presumed data length of the columns
  3. rather than making a computation inside the SQL (100/18), replace it like this

    HAVING (COUNT(table2.ID) * (5.5555)) >= 50

(since table2.ID is a bigint now, mathematical comparison has to be slightly faster)

as I see, the like clause is vital in this query, even though like is expensive, its better if you index the table1.attribute1 as well.

hope this helps

Sanath
  • 4,774
  • 10
  • 51
  • 81