0

I have a problem with where clause ordering.

I saw from other questions that order of where clause doesn't effect performance of sql query, but my query performance changes as I change order.

select sql_no_cache idx
from rr
where (timestamp >= '2016-11-28' and timestamp <= '2016-12-28')
  and ((select name from xx where midx=rr.midx) like 'test'

is faster than

select sql_no_cache idx
from rr
where ((select name from xx where midx=rr.midx) like 'test'
  and (timestamp >= '2016-11-28' and timestamp <= '2016-12-28')

table rr has index on 'idx', 'midx' and 'timestamp'

Any idea why??

nolsigan
  • 7
  • 4

2 Answers2

0

My guess is that it is due to shortcircuiting, basically the sub select is only executed when the timestamp condition is true.

You will probably get the same performance results when the timestamp condition is true for most of the rows.

This SO has some info on this.

And also this one, marked as duplicate as the former.

Community
  • 1
  • 1
Klinger
  • 4,900
  • 1
  • 30
  • 35
  • Thank you for your answer. Could you explain a little more specifically? I understood that sub select is only executed when timestamp condition is true for the first query, but why not for the second query? Shouldn't optimizer choose to run index scan on timestamp regardless of clause ordering? – nolsigan Dec 28 '16 at 07:44
  • 'Why' is a philosophical problem. Is the problem that you're having really a philosophical one? – Strawberry Dec 28 '16 at 07:54
  • Hmm.. I don't think so. I just want to be sure whether I have to consider where clause ordering every time I try to write sql query or just trust optimizer that it will do the job for me. – nolsigan Dec 28 '16 at 08:02
  • I am no specialist on this, and so I will answer with another question: how is the optimizer to know that the first condition is better than the second? – Klinger Dec 28 '16 at 08:02
  • As far as I know parenthesis will force a certain order, and from my experience short circuiting does affect the order of evaluation, but like I said I am not authority on this. – Klinger Dec 28 '16 at 08:04
  • @Klinger Alright! Thank you a lot. I tried changing clause order with one timestamp clause and one non-select clause, and ordering didn't affect this one. It seems like your other question is a good one. – nolsigan Dec 28 '16 at 08:17
0

If all the info is readily available, the Optimizer will look at all the parts of WHERE, then decide which parts to do first. When the order is semantically irrelevant, the order is ignored.

One rule, actually intended as an optimization, is that MATCH...AGAINST will be performed first.

However, in a few situations (such as yours), the Optimizer has no clue of which part would be better. So it fell back on hoping that the user knows best; short-circuiting could make a huge difference in the performance.

Parentheses do not control order, only semantics. (Often they are simply redundant.)

(select name from xx where midx=rr.midx) like 'test' could be turned into EXISTS (SELECT * FROM xx WHERE midx=rr.midx AND name = 'test') -- The Optimizer does understand EXISTS(...) and does a better job than your formulation.

"table rr has index on 'idx', 'midx' and 'timestamp'" -- ambiguous. Is that three 1-column indexes? Or one 3-column composite index? (There is a big difference.) These are desirable:

rr:  INDEX(timestamp)
xx:  INDEX(midx, name)
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Great answer! I'll give a try with EXISTS. Thanks! – nolsigan Dec 29 '16 at 04:07
  • I suggest using IN instead of EXISTS since then the optimizer may choose to do SEMIJOIN transformation, and convert the subquery into a join. In other words, something like: "SELECT idx FROM rr WHERE timestamp BETWEEN '2016-11-28' AND '2016-12-28' AND midx IN (SELECT midx FROM xx WHERE name = 'test'). The indexes suggested by Rick will still be useful. – Øystein Grøvlen Jan 02 '17 at 07:35
  • The semijoin optimization was not added until 5.6.10 (or maybe later). @nolsigan - what version are you using? – Rick James Jan 02 '17 at 21:18
  • mysql version is 5.6.28. So it seems like using IN is a good choice.:D – nolsigan Jan 03 '17 at 07:03