1

For example:

SELECT UserID, count(*) FROM Messages WHERE UserID IN (3,6,8,11,12,13, ...)

I guess if you hit the MySQL max query (packet?) size there's a problem, but in terms of optimization, is there a better way to do it when the are a large number of IDs, perhaps by creating a temporary table, inserting the IDs and doing a join? If so, what is the other method and what is the cutoff point to switch to the other method?

Ben Holness
  • 2,457
  • 3
  • 28
  • 49

2 Answers2

2

I've seen 70K items in an IN list. But it was beginning to show sluggishness.

I would re-design the flow of things so that your 500K ids landed in a table (possible a temp table). Then I would do a JOIN.

There are various limits buried in the MySQL code, but most of them are so high that users rarely hit them. You are stressing an area beyond anything I have seen.

I think the code builds an array in memory of the IN list, then does a binary search. This should scale well and be efficient. But the size of this temp array may be pushing the memory allocation to the "swap" point. Swapping is terribly bad for MySQL.

I'm also pretty sure that it de-dups the numbers as it builds that array.

Yes there is a tunable VARIABLE that controls the max length of an SQL statement, but you need only a few MB; the default is perhaps 16MB, so it not croak for 500K.

You have not actually tried it yet? How about trying it with 1M random values.

Rick James
  • 135,179
  • 13
  • 127
  • 222
1

According to MYSQL OR vs IN performance, IN is the fastest way to do this. It also shows that you can do this with thousands of values in the IN; you have to be more specific about "lots", but I don't think there's a practical limitation.

The 4 ways I know of doing this:

IN - fastest, most readable.

OR - slower, less readable - especially in complex where clauses.

UNION

SELECT UserID, count(*) FROM Messages WHERE UserID = 3
UNION
SELECT UserID, count(*) FROM Messages WHERE UserID = 6
UNION 
...

Again, harder to read, and probably slower.

Finally, inserting the required IDs into a temporary table. The extra I/O, and creating an index on the temporary table etc. almost certainly means this will be noticably slower than IN.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52