6

This is more of a theory question.

If I'm running 50,000 queries that insert new rows, and 50,000 queries that updates those rows, which one will take less time?

linuxbuild
  • 15,843
  • 6
  • 60
  • 87
Citizen
  • 12,430
  • 26
  • 76
  • 117
  • 3
    This depends on various factors. Do the tables have indexes? Will you update all the columns the insert would insert into or only a subset thereof? Also, I fail to see the relevance of this question. If in doubt, measure for your use case with your data... – Vinko Vrsalovic Jul 14 '10 at 23:02
  • I'd suggest that since INSERT and UPDATE do not do the same thing, the best one to use would be the one that does what is necessary, and to the devil with how long it takes. See http://c2.com/cgi/wiki?PrematureOptimization – Brian Hooper Jul 15 '10 at 10:37

5 Answers5

5

Insert would be faster because with update you need to first search for the record that you are going to update and then perform the update.

Though this hardly seems like a valid comparison as you never have a choice whether to insert or update as the two fill two completely different needs.

EDIT: I should add too that this is with the assumption that there are no insert triggers or other situations that could cause potential bottlenecks.

spinon
  • 10,760
  • 5
  • 41
  • 59
  • 3
    not entirely true... you could have a table listing 'last login' times. You could either update the records or insert new ones every time a user logs in. Though such a behaviour would also depend on access times, table growth rates, etc. – Jonathan Fingland Jul 14 '10 at 23:10
  • 2
    Not necessarily - Update could be faster due to fewer page splits depending on what the clustered and other indexes are and a myriad of other factors. – Martin Smith Jul 14 '10 at 23:10
  • I agree that there are factors that could contribute to this being not the case. But I think in most scenarios of everyday users this is not. But I could be wrong. – spinon Jul 14 '10 at 23:12
  • An INSERT still has to walk the indexes to insert the new entries into them. So you're still doing the same searches (possibly more as you're updating all indexes in the table not just the one's affected by the UPDATE), so the INSERT isn't likely to be any faster. – steveayre Aug 29 '13 at 10:34
  • @steveayre that's an interesting argument. I think the real answer would probably be that every situation is different and it's hard to give a conclusive answer without talking about a specific situation. – spinon Sep 03 '13 at 18:24
  • If I want to merge 2 rows into one, I can delete 2 rows and insert a new one. Or I can delete one and update the one left. So it seems there can be valid use-cases to ask this question. – Sylvain Gantois Nov 21 '22 at 15:19
4
Insert Operation : Create  -> Store

Update Operation : Retrieve -> Modify -> Store

Insert Operation faster.

Vishwanath Dalvi
  • 35,388
  • 41
  • 123
  • 155
1

With an insert into the same table, you can always insert all the rows with one query, making it much faster than inserting one by one. When updating, you can update several rows at a time, but you cannot apply this to every update situation, and often you have to run one update query at a time (when updating a specific id) - and on a big table this is very slow having to find the row and then update it every time. It is also slower even if you have indexed the table, by my experience.

Vikdor
  • 23,934
  • 10
  • 61
  • 84
John
  • 158
  • 1
  • 5
0

As an aside here, don't forget that by doing loads more inserts than updates, you have more rows when you come to select, so you'll slow down the read operation.

So the real question then becomes - what do you care about more, a quick insert or a speedy read. Again, this is dependant on certain factors - particularly (and not yet mentioned) DB engine, such as InnoDB (which is now standard in PHPMyAdmin incidentally).

I agree with everyone else though - there's too much to consider on a case-by-case basis and therefore you really need to run your own tests and assess the situation from there based on your needs.

Lukey
  • 922
  • 1
  • 7
  • 9
0

There's a lot of non-practical answers here. Yes, theoretically inserts are slower because they have to do the extra step of looking up the row. But this is not at all the full picture if you're working with a database made after 1992.

Short answer: they're the same speed. (Don't pick one operation over the other for the sake of speed, just pick the right operation).

Long answer: When updating, you're writing to memory pages and marking them as dirty. Any modern database will detect this and keep these pages in cache longer (this is opposed to a normal select statement which doesn't set this flag). This cache is also smart enough to hold on to pages that are accessed frequently (See LRU-K). So subsequent updates to the same rows will be pretty much instant, no lookups needed. This is assuming you're updating based on index'd columns such as IDs (I'll talk about that in a second).

Compare this to a rapid amount of inserts, new pages will need to be made and these pages needed to be loaded into the cache. Sure you can put multiple new rows on the same page, but as you continue to insert this page is filled up and tossed away never to be used again. Thus, not taking advantage of re-using pages in the cache. (And just as a note, "loading pages into the cache" is also known as a "page fault", which is the #1 slower-downer of database technology in most environments, MonogoDB is always inclined to share this idea).

If you're inserting on basis of a column that isn't index: yeah that is WAY slower than inserting. This should be made infrequent in any app. But mind you, if you DO have indexes on a table, it will speed up your updating but also will slow your inserting because this means newly inserted rows will have to insert new index data as well (as compared to updates which re-use existing index data instead of generating new ones). See here for more details on that in terms of how MySQL does it.

Finally, Multi-threaded/multi-processing environments can also turn this idea on its head. Which, I'm not going to get into that. That's a whole 'nother can of worms. You can do your research on your type of database + storage engine for this as well as gauge your apps use of concurrent enviroment... Or, you can just ignore all that and just use the most intuitive operation.