3

I have two tables, identities and events.

identities has only two columns, identity1 and identity2 and both have a HASH INDEX.

events has ~50 columns and the column _p has a HASH INDEX.

CREATE TABLE `identities` (
  `identity1` varchar(255) NOT NULL DEFAULT '',
  `identity2` varchar(255) DEFAULT NULL,
  UNIQUE KEY `uniques` (`identity1`,`identity2`),
  KEY `index2` (`identity2`) USING HASH,
  KEY `index1` (`identity1`) USING HASH
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

-

CREATE TABLE `events` (
  `rowid` int(11) NOT NULL AUTO_INCREMENT,
  `_p` varchar(255) NOT NULL,
  `_t` int(10) NOT NULL,
  `_n` varchar(255) DEFAULT '',
  `returning` varchar(255) DEFAULT NULL,
  `referrer` varchar(255) DEFAULT NULL,
  `url` varchar(255) DEFAULT NULL,

  [...]

  `fcc_already_sells_online` varchar(255) DEFAULT NULL,
  UNIQUE KEY `_p` (`_p`,`_t`,`_n`),
  KEY `rowid` (`rowid`),
  KEY `IDX_P` (`_p`) USING HASH
) ENGINE=InnoDB AUTO_INCREMENT=5231165 DEFAULT CHARSET=utf8;

So, why does this query:

SELECT SQL_NO_CACHE * FROM events WHERE _p IN (SELECT identity2 FROM identities WHERE identity1 = 'user@example.com') ORDER BY _t

takes ~40 seconds, while this one:

SELECT SQL_NO_CACHE * FROM events WHERE _p = 'user@example.com' OR _p = 'user2@example.com' OR _p = 'user3@example.com' OR _p = 'user4@example.com' ORDER BY _t

takes only 20ms when they are basically the same?


edit:

This inner query takes 3,3ms:

SELECT SQL_NO_CACHE identity2 FROM identities WHERE identity1 = 'user@example.com'

dmmd
  • 2,938
  • 4
  • 33
  • 41
  • For the reference: `WHERE field IN (values list)` can be treated as `WHERE field = value1 OR field = value2 OR ...` but `WHERE IN (subquery)` can not, it's a different thing for MySQL. You can read a bit more [here](http://stackoverflow.com/questions/19654878/inner-join-select-a-b-on-a-and-b-vs-where-a-b-in-selecta-b-in-mysql/19655121#19655121) – Alma Do May 20 '16 at 14:55
  • Nice, @AlmaDo, send this as an answer and I'll accept it. – dmmd May 20 '16 at 15:00
  • Also, if you could elaborate a bit more on the answer, that'd be great. – dmmd May 20 '16 at 15:00

3 Answers3

2

The cause:

MySQL treats conditions IN <static values list> and IN <sub-query> as different things. It is well-stated in documentation that the second one is equal to = ANY() query which can not use index even if that index exists. MySQL is just not ingenious enough to do it. On the opposite, first one is treated as a simple range scan when the index is there meaning that MySQL can easily use the index.

Possible ways to resolve:

As I see it, there are workarounds and you've already even mentioned one of them. So it may be:

  • Using JOIN. If there is a field to join by, this is most likely the best way to solve a problem. Actually, since version 5.6 MySQL already tries to enforce this optimization if it's possible, but that does not work in complex cases or in case where there is no dependent sub-query (so basically if MySQL can not "track" that reference). Looking to your case, this isn't an option and this is actually what is not happening for your sub-query.
  • Querying the sub-resource in the application and forming the static list. Yes, despite the common practice is to avoid multiple queries due to connection/network/query planning overhead, this is the case where actually it can work. In your case, even if you have something like 200ms overhead on all the recounted stuff before, it still worth to query sub-resource independently and substitute static list to next query in the application afterwards.
Alma Do
  • 37,009
  • 9
  • 76
  • 105
-1

this is already asked

it's easier to to manage the IN operator because is only a construct that defines the OR operator on multiple conditions with = operator on the same value. If you use the OR operator the optimizer may not consider that you're always using the = operator on the same value.

Community
  • 1
  • 1
Aniket
  • 119
  • 1
  • 9
-1

Because your query is calling this inner query for each row in events table.

In second case indentity table is not used.

You should use joining instead.

mjpolak
  • 721
  • 6
  • 24
  • Thanks. I can't use join as I'm returning a set of rows to run a where upon. – dmmd May 20 '16 at 14:59
  • Yes you can,for example: SELECT SQL_NO_CACHE * FROM events e INNER JOIN identities i ON e._p=i.identity2 WHERE identity1 = 'user@example.com' ORDER BY _t – mjpolak May 20 '16 at 15:55
  • Won't this consume way more resources than using static values for the IN? Not sure if JOIN is a good workaround for this, mind elaborating in your answer, please? (ps. wasn't me that downvoted you) – dmmd May 20 '16 at 16:10