6

I have a table with 2 million rows. I have two index (status, gender) and also (birthday).

I find strange that this query is taking 3.6 seconds or more QUERY N° 1

SELECT COUNT(*) FROM ts_user_core
WHERE birthday BETWEEN '1980-01-01' AND '1985-01-01'
    AND status='ok' AND gender='female';

same for this: QUERY N° 2

SELECT COUNT(*) FROM ts_user_core
WHERE status='ok' AND gender='female'
    AND birthday between '1980-01-01' AND '1985-01-01';

While this query is taking 0.140 seconds QUERY N° 3

select count(*) from ts_user_core where (birthday between '1990-01-01' and '2000-01-01');

Also this query takes 0.2 seconds QUERY N° 4

select count(*) from ts_user_core where status='ok' and gender='female'

I expect the first query to be way more faster, how can be possible this behavior? I can't handle so much time for this query.

Here the result of: enter image description here

I know that I can add a new index with 3 columns, but is there a way to have a faster query without adding an index for every where clause?

Thanks for your advice

Stefano Giacone
  • 2,016
  • 3
  • 27
  • 50
  • 2
    For `0.140 ` where clause is different. There can be billion of rows between '1980-01-01' and '1985-01-01' and only one row between '1990-01-01' and '2000-01-01' – Giorgi Nakeuri Apr 28 '15 at 13:51
  • 5
    While MySQL can use multiple indexes in a query, it chooses one that may not be optimal. [See this thread, specifically the accepted answer.](http://stackoverflow.com/questions/12222630/can-mysql-use-multiple-indexes-for-a-single-query) You may benefit from indexing `(status, gender, birthday)`. – bishop Apr 28 '15 at 13:53
  • 1
    The first and second queries are, in fact, identical. The order of conditions in the `WHERE` clause does not matter, MySQL evaluates them in the order it finds most appropriate to get the results faster. – axiac Apr 28 '15 at 14:13

4 Answers4

3

is there a way to optimize the query without adding an index for every possible where clause?

Yes, somewhat. But it takes an understanding of how INDEXes work.

Let's look at all the SELECTs you have presented so far.

  1. To build the optimal index for a SELECT, start with all the = constant items in the WHERE clause. Put those columns into an index in any order. That gives us INDEX(status, gender, ...) or INDEX(gender, status, ...), but nothing deciding between them (yet).
  2. add on one range or all the ORDER BY. In your first couple of SELECTs, that would be birthday. Now we have INDEX(status, gender, birthday) or INDEX(gender, status, birthday). Either of these is 'best' for the first two SELECTs.

Those indexes work quite well for #4: select count(*) from ts_user_core where status='ok' and gender='female', too. So no extra index needed for it.

Now, let's work on #3: select count(*) from ts_user_core where (birthday between '1990-01-01' and '2000-01-01');

  • It cannot use the indexes we have so far.
  • INDEX(birthday) is essentially the only choice.

Now, suppose we also had ... WHERE status='foo'; (without gender). That would force us to pick INDEX(status, gender, birthday) instead of the variant of it.

Result: 2 good indexes to handle all 5 selects:

INDEX(status, gender, birthday)
INDEX(birthday)

Suggestion: If you end up with more than 5 INDEXes or an index with more than 5 columns in it, it is probably wise to shorten some indexes. Here is where things get really fuzzy. If you would like to present me with a dozen 'realistic' indexes, I'll walk you through it.

Notes on other comments:

  • For timing, run each query twice and take the second time -- to avoid caching effects. (Your 3.6 vs 0.140 smells like caching of the index.)
  • For timing, turn off the Query cache or use SQL_NO_CACHE.
  • The optimizer rarely uses two indexes in a single query.
  • Show us the EXPLAIN plain; we can help you read it.
  • The extra time taken to pick among multiple INDEXes is usually worth it.
  • If you have INDEX(a,b,c), you don't need INDEX(a,b).
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • thanks very much! It can be an issue to have 10 indexes on the same table? – Stefano Giacone May 16 '15 at 10:20
  • You _can_ have 10 indexes on a single table. But it is not necessarily wise. However, there is no simple way to decide whether 10 is ok. Go ahead and do 10; start another question if it seems to give you trouble. [This blog](http://mysql.rjweb.org/doc.php/index_cookbook_mysql) may help some. – Rick James May 16 '15 at 14:56
1

In first case, you have two indexes, and while MySQL optimizer read your query, it must find out which plan is more optimal.

Because you have two indexes, optimizer spend more time to decide which plan is more optimal, because it create more possible execution plans.

In second cases, MySQL positions at first index page which consist status 'ok' and read all pages while gender is not changed to 'male', which is faster than first case.

Try to create one index with three columns from WHERE clause.

veljasije
  • 6,722
  • 12
  • 48
  • 79
1

It's more than likely the case that mysql is terminating your index usage after it performs a range scan over your date range.

Run the following queries in the mysql client to see how it's using your indices:

EXPLAIN EXTENDED 
SELECT COUNT(*) FROM ts_user_core
WHERE birthday BETWEEN '1980-01-01' AND '1985-01-01'
AND status='ok' AND gender='female';

SHOW INDEX IN ts_user_core;

I'm guessing that your index or primary key has birthday before status and/or gender in the index causing a range scan. Mysql will terminate all further index usage after it performs a range scan.

If that's the case, you can then re-arrange the columns in your index to move status and gender before birthday or create a new index specifically for this query with status and gender before birthday.

Before you re-arrange an existing index, however, make sure that no other queries our system will run depend on the current ordering.

Vinbot
  • 518
  • 2
  • 14
1

The difference between no1 and no2 is down to the stored data being cached. If you had looked at the execution plans you would find they were exactly the same.

select count(*) from ts_user_core where (birthday between '1990-01-01' and '2000-01-01');

With an index on birthday will not look at the table data (and similarly for status and gender). But MySQL can only use one index per table - so for a query using both predicates, it will select the more specific index (shown in EXPLAIN) to resolve the predicate, then fetch the corresponding table rows (expensive operation) to resolve the second predicate.

If you either add an index with all 3 columns then you will have a covering index for the compound query. Alternatively, add the primary key (you didn't tell us the structure of the table, I'll assume "id") and...

SELECT COUNT(*)
FROM ts_user_core bday
INNER JOIN ts_user_core stamf
ON bday.id=stamf.id
WHERE bday.birthday BETWEEN '1980-01-01' AND '1985-01-01'
AND stamf.status='ok' AND stamf.gender='female';

Side note:

status='ok' AND gender='female'

Columns which have a small set of possible values and/or skewed data (such that some values are MUCH more frequent than others) tend not to work well as indexes, although the stats here suggest that might not be an issue.

symcbean
  • 47,736
  • 6
  • 59
  • 94
  • I tried your query, should it run faster than mine? Because I tried it and it took 19 seconds... – Stefano Giacone Apr 28 '15 at 14:52
  • Did you change the indexes? – symcbean Apr 28 '15 at 14:55
  • mysql can use more than one index per table per query, ever since version 5.0 when it gained index merge support. https://dev.mysql.com/doc/refman/5.0/en/index-merge-optimization.html – pala_ Apr 28 '15 at 14:59
  • No I didn't change the index – Stefano Giacone Apr 28 '15 at 15:12
  • @pala_: read the page you linked. Pay attention to the term "range scan" – symcbean Apr 28 '15 at 21:37
  • I have, and I did. a range scan selects rows in a range [using an index](https://dev.mysql.com/doc/refman/5.0/en/explain-output.html#jointype_range). And the link also very explicitly states that it uses multiple indexes on a single table reference. – pala_ Apr 28 '15 at 23:47
  • "If a range scan is possible on some key, the optimizer will not consider using Index Merge Union or Index Merge Sort-Union algorithms" – symcbean Apr 29 '15 at 09:33