4

I have been busy changing some SQL queries around so that they look more readable to the human eye, i was also told that they would be maybe 5-10% faster.

The previous SQL statements would look like this.

SELECT * FROM teams WHERE Team1='Joe Bloggs' OR Team2='Joe Bloggs' OR Team3='Joe Bloggs'

I changed it to

SELECT * FROM teams WHERE 'Joe Bloggs' IN (Team1,Team2,Team3)

The new query is approx 10 times slower, after checking what could be the cause i found that it was not using any indexes and even if i tried to force an index it still wouldn't use it.

The table has approx 120,000 rows and i cannot change the table format as other applications i dont have access to, use it. The Team1,Team2,Team3 columns are all VARCHAR(45)

Could anybody explain why the indexes are used for the original query but not the new one? I've read a ton of pages but cannot find the answer, i've read that its possible that mysql is determining that its faster not to use the index, however that shouldnt be the case here as the IN query is almost 10x slower.

Multiple ORs SELECT (run 1000 times with no Cache) - 12.863906860352 elapsed IN SELECT (run 1000 times with no Cache) - 122.73787903786 elapsed

Thank you for your time.

Twingo
  • 43
  • 3

4 Answers4

5

In the query:

SELECT * FROM teams WHERE 'Joe Bloggs' IN (Team1,Team2,Team3)

you are comparing (looking up) a bunch of columns against a string literal. The optimizer would typically use an index on the search target, in this case Joe Bloggs, to find a value in the IN clause. But, it can't put an index on a string literal. So, everything is reversed here, and this is why the index won't help.

On the other hand, in your first query:

SELECT * FROM teams WHERE Team1='Joe Bloggs' OR Team2='Joe Bloggs' OR Team3='Joe Bloggs'

MySQL would grab the string literals, and then look them up against the various columns using a B-tree index. This behaves as you would expect and are seeing.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • . . I don't think MySQL uses an index for `OR`, although more recent versions may have implemented this optimization. – Gordon Linoff Jul 26 '18 at 14:16
  • @GordonLinoff But then how to explain the OP's observations? – Tim Biegeleisen Jul 26 '18 at 14:17
  • The OR query is definitely using the indexes, it's much faster and using EXPLAIN shows that it is only querying a small amount of rows and the index it is using, vs the IN statement which is querying all the rows in the table and using no index. – Twingo Jul 26 '18 at 14:42
  • Tim, in your experience, should i then stick to the multiple ORs as my bench marking so far seems to indicate that in my use case this is the fastest, or is their still a simpler way to write the query, whilst still using the indexes? – Twingo Jul 26 '18 at 14:52
  • What is your aversion to your original query, which is working for you, other than the visual part of it? – Tim Biegeleisen Jul 26 '18 at 14:54
  • It is called index merge optimisation, see https://dev.mysql.com/doc/refman/5.5/en/index-merge-optimization.html – Shadow Jul 26 '18 at 14:59
  • Tim, its purely down to the visual part of it, in the above example, im only checking the columns for 1 name, in some cases it can be as much as 15 names, so i just roll with a single big OR statement, which looks messy, but is the fastest way ive found so far. I thank you for your time :) – Twingo Jul 27 '18 at 10:56
0

I don't know why the performance would be different -- indexes would not seem to be used in either case.

You can write the query like this:

SELECT t.*
FROM teams t
WHERE Team1 = 'Joe Bloggs'
UNION ALL
SELECT t.*
FROM teams t
WHERE Team2 = 'Joe Bloggs' AND Team1 <> 'Joe Bloggs' 
UNION ALL
SELECT t.*
FROM teams t
WHERE Team3 =  'Joe Bloggs'
  AND Team2 <> 'Joe Bloggs'
  AND Team1 <> 'Joe Bloggs';

This can make use of indexes on (Team1), (Team2, Team1), and (Team3, Team2, Team1).

Rick James
  • 135,179
  • 13
  • 127
  • 222
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • As i mentioned in an above comment, indexes are most certainly used with the multiple OR query, using EXPLAIN before the statement shows its using the index and it shows its only querying a small amount of ROWS, vs the IN statement which is doing a full table scan and using no index. I really thank you for taking the time out to write your answer, unfortunately it ran slower than just using multiple OR and it made human reading the query significantly more difficult to my eyes than just using OR several times. Although i fully appreciate that your way, maybe the correct way to do it. – Twingo Jul 26 '18 at 14:48
  • Yeah, MySQL can use index merge optimisation to use indexes for such `or` conditions, see https://dev.mysql.com/doc/refman/5.5/en/index-merge-optimization.html As you can see, this functionality has been available since v5.5 at least. – Shadow Jul 26 '18 at 14:58
  • @Shadow . . . It is surprising that the merge scan wouldn't be used for an `in` list of columns. – Gordon Linoff Jul 26 '18 at 16:34
  • 1
    @GordonLinoff - MySQL's history of design and development is riddled with doing just enough to say that a feature is implemented. `ALTER` was so simple (to implement) when it always rebuilt the table. Now 5.6, 5.7, and 8.0 have multiple attempts at wringing out every possible optimization. I have been watching `EXPLAINs` since Index merge was implemented (in 4.1?) It has almost never been used, even for queries that look pretty likely. It was probably aimed only at `IN`, not `JOIN`, and certainly not `IN`. I would say that the "inverted IN" is extremely rare. – Rick James Aug 17 '18 at 05:55
  • @GordonLinoff - Another case of feature-exists-but-not-optimized: `WHERE (a, b) > (123, 345)` -- `INDEX(a,b)` is not helpful. The workaround is ugly (though optimized). It is much uglier with 3+ columns. – Rick James Aug 17 '18 at 05:58
  • @GordonLinoff - a justification for the non-use of Index merge: Other vendors go through a "ROWNUM", which makes Index merge shine; InnoDB bypasses such, making it less beneficial. – Rick James Aug 17 '18 at 06:00
0

You’ve got an “inverted IN”; the optimiser will only use an index for column in (value1, value2, value3).

But, if you have separate indexes on each of the 3 columns, there is another way that should yield far better performance than either of your attempts:

SELECT * FROM teams WHERE Team1='Joe Bloggs'
UNION
SELECT * FROM teams WHERE Team2='Joe Bloggs'
UNION
SELECT * FROM teams WHERE Team3='Joe Bloggs'

The table will be queries 3 times, but each time an index will be used.

If you are sure there won’t be any dupes, or you don’t mind dupes, change UNION to UNION ALL to a further speed up (UNION has the extra overhead or de-duping).

Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • I appreciate the answer, i never knew what an "inverted IN" statement was, i changed the query to exactly as you wrote above, but unfortunately after running a benchmark approx 10 times to get a good average, its almost twice as slow as the original OR statement. With 1000 iterations taking 28.898875951767 seconds. It also defeats the main thing i was targeting which was readability, its much easier for me to and others to understand the multiple ORs , but maybe thats just because we are used to looking at these statements for a long time. Thank you for your answer :) – Twingo Jul 26 '18 at 14:45
  • @twingo I just made up the name “inverted IN” to help describe the pattern (I will however keep using it from now on, because it sounds cool). Sorry to hear the rewrite didn’t work out for you. Are you sure there are 3 indexes defined - one index for each column? If so, try running `analyze teams`. However, it could be that MySQL’s optimizer is doing a good job and your original OR query is the way to go. – Bohemian Jul 26 '18 at 14:50
  • @Twingo - When timing, run twice and take the second timing. The first may involve I/O; the subsequent runs are likely to be the same as each other due to caching of data. – Rick James Aug 17 '18 at 05:44
  • 1
    @Twingo - By writing the `OR` in multiple lines and aligning the text, `OR` will be just as clear to read as `IN`. – Rick James Aug 17 '18 at 05:45
0

Plan A: Use FULLTEXT (team1, team2, team3) and MATCH(team1, team2, team3) AGAINST ('+Joe +Briggs' IN BOOLEAN MODE). There are many caveats to using this approach, but, if it applies in your case, it will be very fast.

Plan B: In spite of "cannot change the table format", you may be able to play some game with VIEWs to avoid splaying an array (the teams) across columns.

Rick James
  • 135,179
  • 13
  • 127
  • 222