0

Here is my query:

SELECT count(1)
FROM qanda question
JOIN qanda answer ON question.Id = answer.related
WHERE answer.related IS NOT NULL
AND answer.author_id = 29
AND question.amount IS NULL
AND answer.id not in (
  select post_id
  from votes
  where table_code = 15
  group by post_id
  having sum(value) < 0)

And this is the EXPLAIN result of it:

enter image description here

As you see, all tables are using an index. Ok, now I need to add one more condition on the WHERE clause of outer query. This is that condition:

... AND from_unixtime(answer.date_time) BETWEEN (now() - INTERVAL 1 year) AND (now() - INTERVAL 1 hour)

Well, after adding that new condition, this is the EXPLAIN result of it:

enter image description here

See? answer table doesn't use any index anymore. Why? And what index do I need to make query above faster and more efficient?

stack
  • 10,280
  • 19
  • 65
  • 117
  • 2
    The premise of your conclusion that if MySQL doesn't use an index that a query would be slower. That's wrong. Sometimes it doesn't pay off to use an index - more I/O will be done than doing a table scan. As you can see, the amount of rows MySQL inspects is low. If you add an additional condition MySQL has to inspect **more** rows. It decides that using an index won't help at all so it doesn't use it. In the end, your query is fast. Add more records and test this again (something like 100 000 records or so). It'll use indexes as your data set grows (given the fact their selectivity is high). – N.B. Sep 11 '16 at 01:05
  • @Drew Hah .. well I really like to write an answer for myself .. but sadly I don't have any clue `:-(` .. I even don't know why then number of *rows* which are scanned will be more when I add one more condition on the `WHERE` clause. – stack Sep 11 '16 at 01:36
  • No your question is fine – Drew Sep 11 '16 at 01:37
  • The answer to your question, or most of it, is at the bottom of [this](http://dev.mysql.com/doc/refman/5.7/en/mysql-indexes.html) right above User Comments. – Drew Sep 11 '16 at 01:39
  • @Drew Oh .. do you mean my dataset is small and I have to make a huge dataset and test that query again? – stack Sep 11 '16 at 01:41
  • 2
    Imagine having 2 houses in a town, and a phone book (the index) on a table in front of them. And you want to know the addresses of where Smith and Jones live. And one of the homes has Smith on a plaque above the door. Going to the book then the doors is slower than using your eyes or just doing a scan. – Drew Sep 11 '16 at 01:41
  • @Drew Great example. I got the point .. and now I'm wonder how much MySQL is intelligent. – stack Sep 11 '16 at 01:44
  • Yes. Create a dummy db then run the schema part of this answer of mine [here](http://stackoverflow.com/a/33666394). But stop after a few of the inserts and do an explain of like `explain select * from ratings where id=4`. When you have 11 rows, the explain shows a table scan. When you have like 1000 rows and up toward the 4.7 million, the index is used (Explain) – Drew Sep 11 '16 at 01:45
  • @Drew I see . . . . . Also +1 for your answer. `:-)` – stack Sep 11 '16 at 01:47
  • Why thank you. Ironically it is because of N.B. that I always put the disclaimer at the top saying ignore the MyISAM part of it, and it is in no way a recommendation to ever use it. But the reason it is used in that one is because of the INNODB gap anomaly and I wanted the `max()` to equal `count()`. They didn't need to equal each other, but I was feeling OCD at that moment. – Drew Sep 11 '16 at 01:51
  • In general for questions like these, always publish the output of `show create table xyz` for every table in the query. Doing so gives us the necessary info on indexes in use. Otherwise we are blind. As for the `const` not that you asked but that is similar to [This](http://stackoverflow.com/a/39419835) answer maybe. The first query of yours was trucking fine with `const` until your second query made `const` uneligible and the table scan ensured due to the low row count (2 houses thing) we were just talking about. Lastly, the row count number in Explain is often a wild a** guess. Not gospel – Drew Sep 11 '16 at 01:58
  • @Drew Ah I see .. ok .. next time I'll attach more detail of my database structure to the question. – stack Sep 11 '16 at 02:01
  • Please provide `SHOW CREATE TABLE`. – Rick James Sep 11 '16 at 02:52
  • Dup of http://dba.stackexchange.com/questions/149320/why-if-i-add-one-more-condition-on-where-clause-then-using-an-index-fails . – Rick James Sep 11 '16 at 22:03
  • Stack you rascal don't cross post – Drew Sep 12 '16 at 02:59
  • @Drew ok .. :-) – stack Sep 18 '16 at 21:48

1 Answers1

0

"Using index" implies that the index is "covering". That is, all the columns (from that table) that are needed for the query are in that composite index. It appears that answer has INDEX(author_id, related, id). Hence, the best approach is to use the index BTree and ignore the Data BTree.

When you added a condition on date_time, and that column was not part of the same index, the query might be run one of these ways:

  • Look in the index's BTree, then reach over to the data to check the date_time. Repeat.
  • Ignore the index and simply scan the data.

The cutoff between the two choices varies with the phase of the moon, but typically is somewhere around 20%. The EXPLAIN implies that it chose the latter approach. (See Drew's excellent analogy!)

Change the index to INDEX(author_id, date_time, related, id) (in that order) and change the new where clause to

AND datetime >= UNIX_TIMESTAMP(now() - INTERVAL 1 YEAR)
AND datetime  < UNIX_TIMESTAMP(now() - INTERVAL 1 HOUR)

This index will still be "covering", so it should continue to be "Using index". The order of columns is: (1) check for equality, (2) check a range, then whatever other columns are needed for 'covering'.

This index will be not quite as good for the original query, but it will still be 'covering'.

The change to the WHERE is to avoid "hiding" the column date_time inside a function, which prevents the use of an index.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Why do you think using both `>=` and `<` is better than `BETWEEN`? – Shafizadeh Sep 11 '16 at 21:16
  • `BETWEEN` id identical to `>=` and `<=` in performance and effect. When doing date ranges, I prefer `>=` and `<` so that I don't inadvertently pick up midnight at both ends. – Rick James Sep 11 '16 at 21:52