1

I have thousands of records I need to update, but they are non-sequential, and follow no pattern. Something like this:

161,177,178,180,181, [...] 9515,9516,9519,9521,9522

Let's say I want to do something like this:

UPDATE mytable
SET status="3"
WHERE indexID = 9161;

then repeat for each of the above. Is it OK to use a massive IN?

UPDATE mytable
SET status="3"
WHERE indexID IN (161,177,178,180,181, [...] 9515,9516,9519,9521,9522)

I'm sure the above would work with "several"... but with thousands? Currently 10k is about my upper limit, but that could expand. I don't think 100k would ever be a reality.

I tried it on a set of roughly 6000 and it worked (took about 30 seconds I think - I didn't time it) so this isn't really a problem for me now, but could be down the road.

Is there a better way?

bcsteeve
  • 973
  • 9
  • 22
  • 1
    From what I gather myself in is still only 1 command that is sent while with = you would need thousands of commands. Thus I would think in to be faster normally. The only questionable thing is: can in handle 10k numbers or not (10-50 in one go it CAN handle but 10k I'm not so sure). Lets see the answers but in the worst case you should gather the values into 10-50 number blocks and put these each into an in (at least imho). – Thomas Oct 09 '15 at 06:40
  • No duplicate as you are asking which variant is better but still good information on the in statement is here (especially in the duplicates of that question): http://stackoverflow.com/questions/8808845/mysql-is-there-a-limit-to-the-entries-of-an-in-clause – Thomas Oct 09 '15 at 06:43
  • Have you tried the 10k items in the `IN` clause? As per my knowledge MySQL can handle that. **Note** you may need to increase [max_allowed_packet](https://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_max_allowed_packet) – bansi Oct 09 '15 at 06:51
  • @bansi that is also what I found out. the question is though from what I gather from the OP if it is faster still than the 1 update after each other method (or going by 50 in entry blocks) as the in statement from what I read in the question I put in the comment and its duplicates slows down. And the OP seems to want to get the best suited method there (aka the working fastest I take it) – Thomas Oct 09 '15 at 06:56
  • I tried it with roughly 6000 and it worked after choking on it for about 30 seconds. I guess I'm OK, but yeah I'd like to know if there's a better way. Particularly if my SET is something like 400 chars long. – bcsteeve Oct 09 '15 at 06:59
  • @bcsteeve I would suggest to put that info about set into the questoin itself and also maybe change the question and its title so that its clear you are looking for the fastest way there. – Thomas Oct 09 '15 at 07:11
  • Use `BETWEEN` instead. – Hotdin Gurning Oct 09 '15 at 07:29
  • @RubahMalam he mentioned that those unmbers have gaps in between. I don't think betwen is thus an option. – Thomas Oct 09 '15 at 07:47
  • @Thomas I did have "fastest" (and "best", etc) as the title, but it "warned" me that my question was subjective and would be deleted, so I changed it. I've updated title and question as you suggest. – bcsteeve Oct 09 '15 at 08:00
  • @RubahMalam I changed the example data so it is more obvious they are non-sequential. – bcsteeve Oct 09 '15 at 08:04
  • If the values that you want to exclude less than the values you include, then `between` clause should faster like : * where id between 1 and 20 and id not in (3,5,6,14) * – Hotdin Gurning Oct 09 '15 at 08:07
  • @RubahMalam First off, it is far more difficult for me to generate an exclusion list... but even so, I still don't get how between helps? "between 1 and 20" is not the same as "3,5,6,14"... of course it would include 1,2,4,7-13,14-20 which is - in your example - not desired. What am I missing? – bcsteeve Oct 09 '15 at 08:12
  • @bcsteeve You can see it this way: Ands are equated sequentially. This means. If the first and part fails the 2nd isn't checked. IF the min and max are already limiteing the possible numbers a lot it makes sense to use a combination of either Between a and b AND in (or using and not in depending on taste). Thus the between could limit how many numbers need to be checked one by one if the min and max are possible to be of appropriately limiting numbers. – Thomas Oct 09 '15 at 09:09

0 Answers0