0

My script needs to run through lots of rows, analyze content, and then make updates on them (selectively)

Obviously I have a loop to evaluate each row...

I normally would issue an SQL UPDATE query within this loop (for the current row being evaluated) ... Infact I like this better, more straight cleaner for my code etc. $q = "UPDATE mytable SET status='online' WHERE id='22'";

But lately i've been using "IN" clauses.. the loop would gather the ids in csv and then later do;

if ids exist do query 
    $q = "UPDATE mytable SET status='online' WHERE id IN(22,25,147)";

So is either of the 2 techniques SIGNIFICANTLY BETTER? in reality? or might i as well stick w/ where im comfortable with...

and yes, for code and other tasks i have to do, it's easier to have the individual UPDATE SQLs fire inside the loop...

BrownChiLD
  • 3,545
  • 9
  • 43
  • 61
  • `IN` is generally a slower approach towards this, however you could run a benchmark and calculate the time it takes on both for your case and decide accordingly – Hanky Panky Apr 17 '13 at 03:45
  • You can profile your query and can see the time it takes to execute. Normally update statement will run faster but considering multiple update statements it will take significant amount of database resources and time to execute. So in this case IN should give you more performance. – ATR Apr 17 '13 at 03:49
  • possible duplicate of [MYSQL OR vs IN performance](http://stackoverflow.com/questions/782915/mysql-or-vs-in-performance) – hookenz Apr 17 '13 at 03:53
  • thank you all, im amazed at how fast people help here at stack.... do you guys stare at the screen and jump on new questions? haha. that's mighty noble of ya'll. -gratitudes. – BrownChiLD Apr 17 '13 at 03:53
  • @Matt: this is **not duplicate**, as OP does not consider using `OR` – mvp Apr 17 '13 at 04:05

1 Answers1

2

Single query with IN (<multiple id's>) should be better than multiple queries with one id. This is because server has to parse your SQL statement and create query plan, and cost for parsing is often comparable to actual execution time. Parsing one statement with 100 ids costs about the same as with one id.

However, if number of collected items is beyond 100, better approach is to create another table that will contain these id's. If this table has an index on id, it would be ideal for performance.

mvp
  • 111,019
  • 13
  • 122
  • 148
  • thanks , but how much better is the question too. is it SIGNIFICANTLY better for me to go thru the hassle? and why would you suggest a temporary table beyond 100? does IN clause "strain" the system after 100 or so ? and will looping update be better by then? yes the id is indexed/primary. – BrownChiLD Apr 17 '13 at 03:55
  • If you keep adding id's into your SQL statement on the fly, it gets bigger and bigger, and slower to do on a client even without considering SQL overhead. Also, you will **have to** switch to table if your statement grows more than 1MB (default buffer limit). If you have 100s of ids, you'll probably want ability to restart your process if it is interrupted. If you keep ids in a table, this is easy. But, if you keep it in memory, not so much. – mvp Apr 17 '13 at 04:00
  • well my script could be dealing from anywhere between 1 to 5000 records.. on a regular basis (every 3 seconds or so) .. this is kinda like a live semi-real time thing triggered by java/ajax intervals .. anyway I still don't get it in comparisson w/ separate queries in a loop , wil that become more stable/better then? since each query is started and closed (not buffered to hundreds of ids mysql has to keep in memory).. is that how it works? ..i use INNODB and mysql transactions all anyway im not worried about interruptions it will, simply finish or not – BrownChiLD Apr 17 '13 at 04:54
  • 5000 ids should translate to no more than 25KB SQL with `IN(...)` - you are probably ok with this (without temp table). It depends on your performance requirements. If it is not fast enough today, the only way to confirm is to try and benchmark both approaches. But, I am almost certain that temp table approach should be faster than anything else - but you should use multivalued `INSERT` like `INSERT INTO temptable (id) VALUES(1),(4),(7),...` – mvp Apr 17 '13 at 05:11
  • Yep.. ok thanks very much for the inputs, i just wanted to bit of idea on this subject, and you've helped a lot.. re table, avoiding an extra table and other hoops and loops to do this. i like to code as straight forward as possible ... im gona go w/ the IN solution for now.. and instead of doing a table later when the thing gets too slow, i can introduce a sort of LIMIT on the queries instead .. increase the interval of checks and limit to updating 100 ids/ second or 2 .. ill just keep a session var for my script to know where it left of. – BrownChiLD Apr 17 '13 at 07:44