Large volume database updates with an ORM

Question

I like ORM tools, but I have often thought that for large updates (thousands of rows), it seems inefficient to load, update and save when something like

UPDATE [table] set [column] = [value] WHERE [predicate]

would give much better performance.

However, assuming one wanted to go down this route for performance reasons, how would you then make sure that any objects cached in memory were updated correctly.

Say you're using LINQ to SQL, and you've been working on a DataContext, how do you make sure that your high-performance UPDATE is reflected in the DataContext's object graph?

This might be a "you don't" or "use triggers on the DB to call .NET code that drops the cache" etc etc, but I'm interested to hear common solutions to this sort of problem.

isn't this one of the reasons for using an ORM? i.e. that it provides mechanisms to synchonise in-memory objects and updated stored objects? The mechanism will of course be ORM specific — Mitch Wheat, Mar 22 '09 at 01:11
Yes, but my point is that if you use an ORM tool, and want to improve performance of bulk updates using a direct SQL method, you lose some of the benefits of ORM. — Neil Barnwell, Mar 22 '09 at 10:00

score 4 · Accepted Answer · answered Mar 22 '09 at 01:50

You're right, in this instance using an ORM to load, change and then persist records is not efficient. My process goes something like this

1) Early implementation use ORM, in my case NHibernate, exclusively

2) As development matures identify performance issues, which will include large updates

3) Refactor those out to sql or SP approach

4) Use Refresh(object) command to update cached objects,

My big problem has been informing other clients that the update has occured. In most instances we have accepted that some clients will be stale, which is the case with standard ORM usage anyway, and then check a timestamp on update/insert.

score 3 · Answer 2 · answered Mar 22 '09 at 01:59

Most ORMs also have facilities for performing large or "bulk" updates efficiently. The Stateless Session is one such mechanism available in Hibernate for Java which apparently will be available in NHibernate 2.x:

http://ayende.com/Blog/archive/2007/11/13/What-is-going-on-with-NHibernate-2.0.aspx

score 2 · Answer 3 · answered Mar 22 '09 at 01:27

ORMs are great for rapid development, but you're right -- they're not efficient. They're great in that you don't need to think about the underlying mechanisms which convert your objects in memory to rows in tables and back again. However, many times the ORM doesn't pick the most efficient process to do that. If you really care about the performance of your app, it's best to work with a DBA to help you design the database and tune your queries appropriately. (or at least understand the basic concepts of SQL yourself)

score 1 · Answer 4 · answered Mar 22 '09 at 11:36

1

Bulk updates are a questionable design. Sometimes they seems necessary; in many cases, however, a better application design can remove the need for bulk updates.

Often, some other part of the application already touched each object one at a time; the "bulk" update should have been done in the other part of the application.

In other cases, the update is a prelude to processing elsewhere. In this case, the update should be part of the later processing.

My general design strategy is to refactor applications to eliminate bulk updates.

answered Mar 22 '09 at 11:36

S.Lott

384,516
81
508
779

@Candy Chiu: Are you saying you didn't read the answer? Or the specific examples in the answer weren't clear enough? What does your comment mean? Simply building a bulk update because someone else said "build a bulk update" doesn't address the questionable design decision. – S.Lott Sep 20 '11 at 16:51
"refresh the data in the db with some user defined values". In this case, if multiple rows are touched, there's probably a normalization problem. The multiple rows should probably have been extracted to a different table so that a single-row update did the job correctly. – S.Lott Sep 26 '11 at 18:38
@SLott: There's only one conceptual entity. – Candy Chiu Sep 26 '11 at 20:27
@Candy Chiu: First: "one conceptual entity" and "bulk update" aren't compatible. Bulk == Many. Second: normalization does not change the "conceptual" entities. – S.Lott Sep 26 '11 at 20:33
@SLott: One conceptual entity usually means one table definition. Bulk update updates the instances (rows) of this entity (table). Normalization is not needed in this scenario. – Candy Chiu Sep 27 '11 at 14:36
@Candy Chiu: I guess you're unclear on normalization. One entity type in one table with bulk updates usually means that normalization was not done correctly. That's pretty much the bottom line. You can keep repeating that one entity type in one table is OK. I'm going to keep repeating that if it leads to bulk updates, it was not OK. One purpose of normalization is to prevent the need for bulk updates. – S.Lott Sep 27 '11 at 15:17
@SLott: The one and only one table in this context is very simple. For example, Person with FirstName, LastName, Username, and Gender. If I need to bulk update LastName with values in a file, how do you do it? I hope you see this is NOT a design issue. – Candy Chiu Sep 27 '11 at 18:37
@Candy Chiu: The bulk update of the last names based on a transaction file means you should have actually done the transactions **instead** of accumulating a transaction file. – S.Lott Sep 27 '11 at 18:46
@SLott: the file is an external input, provided by others. – Candy Chiu Sep 28 '11 at 23:52
@Candy Chiu: The file should have been handled as individual SOA/REST requests. Not as accumulation of transaction records in a file. Bulk/batch transactions are a questionable design. – S.Lott Sep 29 '11 at 02:07
@SLott: Unfortunately, accumulation of records results from a limitation of another process. Bulk update is a business requirement. Therefore, I said SOMETIMES it is a Task. – Candy Chiu Sep 29 '11 at 14:54
@Candy Chiu: Unfortunately, accumulation of records results from a questionable design of another process. – S.Lott Sep 29 '11 at 15:15
@SLott: .. which is the reality you have to deal with, unless you fix entire world's existing 'questionable designed' processes. good luck! – Candy Chiu Sep 29 '11 at 15:22
@Candy Chiu: "unless you fix entire world's existing 'questionable designed' processes". No. Just the one supplier of data for the one bulk feed. That's all that needs to be fixed. Try it. They may actually cooperate and thank you for turning a 24-hour, overnight, error-prone ordeal into real-time processing. – S.Lott Sep 29 '11 at 15:24

score 0 · Answer 5 · edited May 23 '17 at 12:11

ORMs just won't be as efficient as hand-crafted SQL. Period. Just like hand-crafted assembler will be faster than C#. Whether or not that performance difference matters depends on lots of things. In some cases the higher level of abstraction ORMs give you might be worth more than potentailly higher performance, in other cases not.

Traversing relationships with object code can be quite nice but as you rightly point out there are potential problems.

That being said, I personally view ORms to be largely a false economy. I won't repeat myself here but just point to Using an ORM or plain SQL?

Large volume database updates with an ORM

5 Answers5