1

I have a table with 100,000+ records. Customer has asked that we encrypt the username field, copy the encrypted value to a new field, and clear the original username field. The encryption can only be performed in the application, not the database.

The existing codebase used the Entity Framework in the past for these tasks, but never for a table of this size. Legacy code looked something like:

foreach(var phone in db.Phones){
     phone.Enc_Serial = Encrypt(phone.Serial);
     phone.Serial = "";
}
db.SaveChanges();

Given this is a bulk update, would there be any advantage to doing this with a raw SQL command? I'm thinking that at least we wouldn't have a ton of tracked objects sitting in the DbContext consuming memory.

var idsAndSerials = db.Phones.Select(p => new { id = p.Id, serial = p.Serial };
foreach(var item in idsAndSerials ){
    string sql = String.Format("Update phone set Enc_Serial ='{0}' where phoneId={1}", Encrypt(item.serial), item.id.ToString());
    db.Database.ExecuteSqlCommand(sql);
}
Mister Epic
  • 16,295
  • 13
  • 76
  • 147
  • 1
    Out of curiosity, why does it matter? Is your legacy method failing to run in a sufficient amount of time? Have you profiled the difference between the two methods for 1k records? – Dan Field Apr 15 '15 at 21:00
  • 3
    A foreach loop for one single update at a time would take a long time. I would advise constructing the table on the C# end, then inserting that into a Temp table on the database and then running one single update. – Hozikimaru Apr 15 '15 at 21:00
  • 1
    If you do this with a "*Raw SQL command*", then technically, you would be performing the encryption in the database and *not* in the application. (Since SQL commands are *requested* by the client, but actually *executed* by the DBMS server). – RBarryYoung Apr 15 '15 at 21:00
  • 2
    Adding to @SurgeonofDeath's suggestion, you could also use a table valued parameter and a stored procedure. Wouldn't get rid of your memory concerns on the .NET end, but would get rid of the multiple updates happening on the SQL end. But then I'd ask again: is your legacy method really performing that badly? – Dan Field Apr 15 '15 at 21:04
  • @DanField No, the legacy function is not performing poorly. A colleague is concerned about the amount of memory such a large amount of records would consume during the operation, so I thought I would throw it out to you good folks on SO for input. – Mister Epic Apr 15 '15 at 21:07
  • @DanField thanks for the link, the half million entity count in the answer to the other question does give me greater confidence. – Mister Epic Apr 15 '15 at 21:09
  • EFUtilities have BulkUpdate support: https://github.com/MikaelEliasson/EntityFramework.Utilities/tree/master#batch-update-entities – Mikael Eliasson May 22 '15 at 06:28

2 Answers2

1

In the example you've provided, no way. You're still iterating through each record and calling UPDATE. At least in the first example I believe those statements would get executed as a batch, and would be transactional so that all updates succeed or none of them do.

mclark1129
  • 7,532
  • 5
  • 48
  • 84
1

Since this is a significant update, I'd suggest creating a tracking table (on the SQL side) in which you sequentially number each of the rows to be updated (and also store the row's PK value). Also include a column in the tracking table that lets you mark the row as done (say 0 or 1). Set a foreign key into the original table via the PK value.

Update your data model on the EF side to include the new tracking table. Now you have a new table that will easily let you retrieve, say, 1K record batches to work on at a time. This won't have excessive memory consumption. The application logic can do the encrypting. As you update those records, mark the ones that are updated as "done" in your tracking table.

Get the next 1K of not done records via tracking table (use navigation properties to get the real records). Repeat.

It'll be done very fast and with no undue load. 100000+ records is not really a lot, especially if you use a divide and conquer approach (100+ batches).

DWright
  • 9,258
  • 4
  • 36
  • 53