0

I have one query that is preventing me from going live with this application, because it can take up to 7 seconds to complete when it isn't cached.

SELECT attribute1
FROM `product_applications`
WHERE `product_applications`.`brand_id` IN (.. like 500 ids...)
GROUP BY attribute1

I have the brand_id indexed. I used to have this doing a SELECT DISTINCT, but opted for the GROUP BY and performance has improved slightly.

This table is using InnoDB, and has about 2.3 million rows. I have run an EXPLAIN on it and it uses the index, it just takes forever.

I know there are a lot of variables to getting something like this to perform. The db is on an Amazon EC2 instance.

Is there some sort of table splitting I could do to get the query to perform better? I really appreciate any help anybody can offer.

EDIT:

Here are the results on my explain, from NewRelic:

Id  1
Select Type SIMPLE
Table   product_applications
Type    range
Possible Keys   brand_search_index_1,brand_search_index_2,brand_search_index_3,brand_search_index_4,brand_sarch_index_5
Key brand_search_index_1
Key Length  5
Ref 
Rows    843471
Extra   Using where; Using index; Using temporary; Using filesort

See, it's using the index. But it's also using a temp table and filesort. How can I overcome that stuff?

EDIT:

Since the time I opened this question, I changed the engine on this table from InnoDB to MyISAM. I also vertically partitioned the table by moving attributes 5 through 60 to another table. But this select statement STILL TAKES BETWEEN 2 AND 3 SECONDS!!!! The poor performance of this query is absolutely maddening.

RubyRedGrapefruit
  • 12,066
  • 16
  • 92
  • 193

4 Answers4

0

please refer to the following answer:

Rewriting mysql select to reduce time and writing tmp to disk

Community
  • 1
  • 1
Jon Black
  • 16,223
  • 5
  • 43
  • 42
  • why the down vote - it's pretty obviously using an innodb engine with a clustered primary key to maximise read performance – Jon Black Feb 11 '11 at 20:27
0

A different approach if there are very few different values of attribute1 iis to try an index on attribute1 to take advantage of the loose index scan.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • I have that index and it is being used. Still slow. – RubyRedGrapefruit Feb 11 '11 at 19:53
  • I will read up on this loose index scan Mark, thanks. There are about 100 different values of attributes1 across over 2 million rows, do you think that is too many? – RubyRedGrapefruit Feb 12 '11 at 02:32
  • @AKWF: No that is not very many at all. How fast is this query: `SELECT DISTINCT attribute1 FROM product_applications`? If you have an index it might be fast. If it is, then maybe you can do `SELECT * FROM (SELECT DISTINCT attribute1 FROM product_applications) T1 WHERE EXISTS (SELECT * FROM product_applications T2 WHERE T2.attribute1 = T1.attribute1 AND T2.brand_id IN (...ids...))`. You can also add try adding a two-column index on (attribute1, brand_id) for this second query. – Mark Byers Feb 12 '11 at 05:47
0

According to this answer IN should be very fast in case of constants otherwise type conversion happens which can slow things.

I would also try a covering index with brand_id as the first column and attribute1 as the second. That should speed up things because your table won't be accessed anymore.

EDIT :

About the temporary/filesort, I suspect they are caused by the your list of +500 ids. Could you try EXPLAIN on the query with only one id in the IN operator ?

Community
  • 1
  • 1
Sem Vanmeenen
  • 2,111
  • 11
  • 13
  • You are correct. With only one id, the EXPLAIN yields only "using WHERE". Adding one more id makes it list everything above. – RubyRedGrapefruit Feb 12 '11 at 02:29
  • @AKWF Then I don't think the temporary & filesort matter all that much. As [this anwser](http://stackoverflow.com/questions/1382260/using-temporary-using-filesort-a-bad-idea-in-mysql/1382269#1382269) explains they aren't necessarily bad. However, you can try to insert all the ids in a indexed temporary table too see if that helps (If you have the ALTER TABLE permission create the index on the temp table after the insert of your ids.) Also, have you tried the covering index ? That should speed up the SELECT/GROUP BY and the `brand_id` part of your IN operator. – Sem Vanmeenen Feb 12 '11 at 10:10
  • That is a covering index Sem. – RubyRedGrapefruit Feb 13 '11 at 18:04
  • @AKWF Weird, I would expect the index to be enough. Could you try to use [SHOW INDEX](http://dev.mysql.com/doc/refman/5.1/en/show-index.html) on the index and add the information to your question ? Also, the method with a temporary table, described in my previous comment, can maybe help. – Sem Vanmeenen Feb 13 '11 at 19:00
  • @AKWF Something I just noticed while going over your question again. The EXPLAIN output says the key_len is 5. According to [this](http://dev.mysql.com/doc/refman/5.1/en/explain-output.html) the 5 means how much bytes mysql uses of the index. But you also said that attribute1 is a varchar 255 and brand_id is probably an int. A key_len of 5 is possible for such a combination of colunms but it is suspiciously short. Could it be that your index does not cover both brand_id and attribute1 completely ? Mysql allows partial indexes that only index the first x bytes of a column – Sem Vanmeenen Feb 14 '11 at 07:36
0

If you can reduce the size of your rows that might help. Make as many columns as possible not null. If you can remove all varchar colums that could help as well.

What exactly does the index it is using cover? Possibly try making the index cover less or more columns.

Have you ran analyze table recently? That may cause it to pick another index. Also you could try forcing certain indexes.

Is there a possibility of reducing the number of ids in the IN clause? What about using a range, if they are always sequential ids?

mjcopple
  • 1,580
  • 2
  • 13
  • 19