13

I'm just wondering if the amount of id's in a list will influence query performance.

query example:

SELECT * FROM foos WHERE foos.ID NOT IN (2, 4, 5, 6, 7)

Where (2, 4, 5, 6, 7) is an indefinitely long list.

And how many is too many (in context of order)?

UPDATE: The reason why i'm asking it because i have two db. On of it (read-only) is the source of items and another one contain items that is processed by operator. Every time when operator asking for new item from read-only db I want to exclude item that is already processed.

user4052054
  • 395
  • 1
  • 6
  • 22
Pol
  • 24,517
  • 28
  • 74
  • 95
  • 16
    Shouldn't you have also tagged this DB2? And what on earth is "infinitely grooving"? Should I be scared or excited? – Aaron Bertrand Jun 22 '12 at 19:51
  • @AaronBertrand I know, sorry, but it was kinda bugging me considering the title says Mysql but then every RDBMS tag was added. – Taryn Jun 22 '12 at 19:56
  • 1
    Your question is a bit like asking "At what speed is driving in reverse dangerous?" In this case, there are many ways to get at what you're asking but we'd probably be more helpful if you'd provide the *actual* reason for your request. – swasheck Jun 22 '12 at 19:57
  • 4
    "Groove is in the list" is a new Dee-Lite smash – Aaron Bertrand Jun 22 '12 at 19:59
  • 3
    Can't speak for the others, but I'll tell you why I down-voted: (1) you asked a basic "will this be slow?" question that you could have tested on your own in less time than it took to write the question. When you have an actual perf problem, post a specific question. (2) you intentionally tagged inappropriately. If you think losing -8 out of ~1000 rep is bad, go post an iPhone question to an Android forum and see what happens. I promise the reaction will not be "girlish." For future reference, please read: http://dba.blogoverflow.com/2012/06/help-us-help-you/ (particularly "tag effectively") – Aaron Bertrand Jun 23 '12 at 15:37
  • @Pol I didn't downvote (yet) but if a more thorough explanation of the reason behind the question is not forthcoming then you may expect one for me. Most people here are skilled professionals (with many being world-renowned experts in their field) but it's impossible for even them to discern from thin air the reason for some presenting symptom. So grow up, stop the name-calling, and give us more to work with - I'm sure you'll appreciate the results. I see you've added some information: are these dbs on the same server? If so, you could `INSERT ... SELECT ... LEFT JOIN ...` – swasheck Jun 23 '12 at 15:51
  • @AaronBertrand thanks, I'll review the "tag effectively". I just expected someone to point me the right way to do things. The reason why I tagged it like this, because i think it is related to any of the "YesSQL" databases :). Thank you again for taking time to explain me things. – Pol Jun 23 '12 at 16:23
  • Different products optimize in different ways, and have different syntax and/or features to help you if they don't optimize well. If you're using MySQL, tag it that way. Will table-valued parameters in SQL Server really help you if you can't use them? – Aaron Bertrand Jun 23 '12 at 16:26
  • @swasheck Yes it'is on the same server, but it can be changed any time, so i wanted to keep separately some logic. Probably i'll need to keep copy of all users in my working db, so i can `JOIN`. I dont want to change any stuff on the main db. – Pol Jun 23 '12 at 16:29
  • Although it is database dependent, I would expect any self respecting database to create a hashset from the values in the list, so the execution time would increase as log(N) or slower. The speed of a hashset tends not to be strongly linked to the number of elements in it. – andrew pate May 02 '18 at 08:51

4 Answers4

10

Yes, the amount of IDs in the list will impact performance. A network packet is only so big, for example, and the database has to parse all that noise and turn it into a series of:

WHERE foo.ID <> 2
AND foo.ID <> 4
AND foo.ID <> 5
AND ...

You should consider other ways to let your query know about this set.

Nickolay
  • 31,095
  • 13
  • 107
  • 185
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
4

Here is wacky rewrite of that query that might perform a little better

SELECT * FROM foos
LEFT JOIN
(
    SELECT 2 id UNION
    SELECT 4    UNION
    SELECT 5    UNION
    SELECT 6    UNION
    SELECT 7
) NOT_IDS
USING (id) WHERE NOT_IDS.id IS NULL;

The NOT_IDS subquery does work as shown by the following:

mysql> SELECT * FROM
    -> (
    ->     SELECT 2 id UNION
    ->     SELECT 4    UNION
    ->     SELECT 5    UNION
    ->     SELECT 6    UNION
    ->     SELECT 7
    -> ) NOT_IDS;
+----+
| id |
+----+
|  2 |
|  4 |
|  5 |
|  6 |
|  7 |
+----+
5 rows in set (0.00 sec)

mysql>
RolandoMySQLDBA
  • 43,883
  • 16
  • 91
  • 132
  • Will this be translated by MySQL any differently than the existing query? I imagine the staggering part will still be building that inner `UNION`, depending on the source of the list. But in my head this still translates to `id <> 2 AND id <> 4 AND id <> 5 AND id <> 6 AND id <> 7`... – Aaron Bertrand Jun 23 '12 at 15:43
3

Just for fun, and given your update, I'm going to suggest a different strategy:

You could join across tables like so ...

insert into db1.foos (cols) 
  select cols
    from db2.foos src
  left join db1.foos dst
    on src.pk = dst.pk
  where dst.othercolumn is null

I'm not sure how the optimizer will handle this or if it's going to be faster (depends on your indexing strategy, I guess) than what you're doing.

swasheck
  • 4,644
  • 2
  • 29
  • 56
0

The db's are in the same server? If yes you can make a multi-db query with a left join and take the null ones. (here an example: Querying multiple databases at once ) . Otherwise you can make a stored procedure, pass the id's with a string, and split them inside with a regular expression. I have a similar problem, but within an in-memory db and a postgres db. Luckly my situation is (In...)

Community
  • 1
  • 1
jurhas
  • 613
  • 1
  • 4
  • 12