1

I have to update a table containing millions of records. Now I am storing the id's of all the records to be updated in a List. The query is generated pragmatically as follows :

string queryPart="";
foreach (int id in transactionsToUpdate.ToList())
{
    queryPart+="TransactionID="+id;
    queryPart+=" OR ";
}

queryPart += "1=0";
string query = @"UPDATE dbo.OutgoingQueue SET Status='C' WHERE "+queryPart;

Currently even with 100,000 values in the list, there are two problems coming up. Firstly, the code above takes a long time to execute ( the query formation part ). And secondly, when I execute the query on DB it gives Timeout Expired exception. Is there is a better way to achieve what I want ?

UPDATE : The first problem of query taking a long time to form has been solved by using stringbuilder. But the second problem still remains. If I increase the timeout, then I get sql out of resource exception.

Samarsh
  • 565
  • 5
  • 18
  • Not using EF in my project. – Samarsh Jul 09 '13 at 12:55
  • What about alternatives? An SSIS package would be better for this kind of thing. – Riv Jul 09 '13 at 12:56
  • 2
    Create a stored proc that accepts a comma separated list of ids. Then in the stored proc use the update statement on the ids with an 'IN' clause rather than 'OR'. From .Net pass a comma separated list to the stored proc. – Azhar Khorasany Jul 09 '13 at 12:57
  • 1
    Are the ID's sequential? Could you sort them and use `where TransactionID > x AND TransactionID < y"? Or split the execution into batches? – RemarkLima Jul 09 '13 at 12:58
  • What is the source of transactionsToUpdate? If it is a table in the database, just join it to OutgoingQueue. If you need to build this client side, use a stringbuilder instead of string concatenation. Alternatively, use BulkInsert to push the values to a table and then join it in the update statement. – Jim Wooley Jul 09 '13 at 13:20
  • @AzharKhorasany - Good suggestion. Will try it. But still there should be a way of doing this from code itself. – Samarsh Jul 09 '13 at 13:21
  • @RemarkLima -No, the id's aren't sequential. It would have been a lot easy then. – Samarsh Jul 09 '13 at 13:22
  • @JimWooley : transactionsToUpdate is generated from the code itself during the program execution. And yes, I'll try with stringbuilder. – Samarsh Jul 09 '13 at 13:25
  • 1
    Dump the IDs into a temp table, then do an update based on a join or an IN(x) query. It'll be faster. – Andrew Lewis Jul 09 '13 at 14:07
  • I like @AndrewLewis's idea, also: http://stackoverflow.com/questions/3711217/fastest-way-to-update-120-million-records – Dave Ziegler Jul 09 '13 at 14:34
  • Whcih database you are using. If using sql server 2008 or plus, I will use table value parameter to pass the list of ids and update via join with that Tvp – Kamran Shahid Jul 10 '13 at 09:30
  • 1
    @KamranShahid : Yes, that's what I did finally. – Samarsh Jul 11 '13 at 09:42
  • Good to know you did it.That's called simplest solution Harsh. Now cursing my self for not putting it in Answer rather then in just comments for missing out good points :) – Kamran Shahid Jul 11 '13 at 10:21

7 Answers7

2

This is an ideal use-case for Table-Valued Parameters. See here: http://msdn.microsoft.com/en-us/library/bb675163.aspx

Alternatively, you could also make a #temp table (or a staging table), fill it with SqlBulkCopy (see here), and then JOIN against it to do your UPDATE.

Community
  • 1
  • 1
RBarryYoung
  • 55,398
  • 14
  • 96
  • 137
  • 1
    I have not used these before. Any idea how this translates to an update of 100K+ rows? "...table-valued parameters perform well for inserting less than 1000 rows." http://msdn.microsoft.com/en-us/library/bb510489.aspx#BulkInsert – Dave Ziegler Jul 09 '13 at 15:16
  • 1
    @DaveZiegler `SqlBulkCopy` is probably faster at the high-end, but somewhat more complex because of the #temp table setup and managment. I don't think that TVP's are *much* slower at the high-end (according to the SQLCAT team, anyway), and if you batch it in groups of 1000 (which you might want to do anyway) it's probably just as fast. – RBarryYoung Jul 09 '13 at 15:27
  • @DaveZiegler I have added `SqlBulkCopy` as an alternative. (This is actually what I usually use myself.) – RBarryYoung Jul 09 '13 at 15:31
  • Table value parameter i consider better then the SqlBulkcopy – Kamran Shahid Jul 11 '13 at 10:37
1

You could either find a method to pass 100,000 values into the database, albeit if you use parameters you run in to a limit very quickly.

UPDATE Additional wrapped in transaction

Alternatively this is what a prepared query is used for

using (var conn = <GETCONNETIONMETHOD>)
{
  conn.Open();
  using (var tran = conn.BeginTransaction()) 
  {
    using (var cmd = conn.CreateCommand(
        @"update dbo.outgoingqueue set status = 'C' where transactionID = @id"))
    {
       cmd.Transaction = conn.BeginTransaction();
       var param = cmd.Parameters.Add("@id", typeof(int));
       cmd.Prepare();
       foreach (int id in transactionsToUpdate.ToList())
       {
         param.Value = id;
         cmd.ExecuteNonQuery();
       }
       tran.Commit();
     }
  }
}

If you have enough permissions to perform a bulk copy then the best way would be

using (var conn = <GETCONNECTIONMETHOD>)
{
   var dt = new DataTable;
   dt.BeginLoadData();
   dt.Columns.Add("id");
   foreach (int id in transactionsToUpdate.ToList() {
     dt.Rows.Add(id);
   }
   dt.EndLoadData();

   using (var cmdSetup = conn.CreateCommand(@"create table #tempUpdate(int id)")) {
      cmdSetup.ExecuteNonQuery();
   }
   var bcp = new SqlBulkCopy(conn);
   bcp.DestinationTableName = "#tempUpdate";
   bcp.WriteToServer(dt);
   using (var cmdUpdate = conn.CreateCommand(
      @"update o set status = 'C' from dbo.outgoingQueue o " +
      @"inner join #tempUpdate t on o.transactionId = t.id"))
   {
      cmd.ExecuteNonQuery();
   }
}
Bob Vale
  • 18,094
  • 1
  • 42
  • 49
  • But I have not marked this is as an answer because in this code cmd.ExecuteNonQuery() would be called 100,000 times which is performance wise not good I guess. Or is it ? – Samarsh Jul 10 '13 at 06:06
  • @HarshMaurya You are right it's bad to call the ExecuteNonQuery() 100000 times very quickly. – Azhar Khorasany Jul 10 '13 at 08:19
  • @AzharKhorasany I can understand conceptually why it might be bad to make 100,000 calls to `ExecutenOnQuery` but do you have any real world numbers to backup the statement. Why specifically is it bad. I'm not arguing, I would just like to know. – Bob Vale Jul 10 '13 at 08:53
  • @HarshMaurya I've now updated the answer to also include a Bulk Copy Operation which should be the best way if you have enough permissions. – Bob Vale Jul 10 '13 at 09:17
0

you can do following things to solve your problem .

  1. Speed Issue For this let me know what items are in transactionsToUpdate.ToList() this list ? If you fill up this list from database then i suggest to modify your update query so that you don't need to run for loop . it will increase performance of your application . you can do something like following query . no need to use looping everytime . it will run lot lot faster then you current code and i think you will not face problem of time out if you use this query .

    UPDATE dbo.OutgoingQueue 
    SET Status='C' 
    FROM dbo.OutgoingQueuE AS A
    INNER JOIN 
    (
    QUERY BY WHICH YOU FILL UP CURRENT LIST
    ) AS B ON A.TransactionID = B.ID
    
  2. Time Out Issue you can set command time out when you execute the sql command.

    SqlCommand myCommand = new SqlCommand(); myCommand.CommandTimeout = 15;

Hiren Dhaduk
  • 2,760
  • 2
  • 19
  • 21
  • Regarding second problem, if I inrease the timeout I get "sql out of resource exception" which it says occurs with extremely complex queries. But I don't see why my query is so complex. – Samarsh Jul 09 '13 at 13:48
  • you query is complex because it contains 100,000 of combination of OR in where clause of the query . that will take lots and lots of time to run in sql server . – Hiren Dhaduk Jul 09 '13 at 13:50
  • Just tell me what is in transactionsToUpdate list ? I have one solution for it . – Hiren Dhaduk Jul 09 '13 at 13:51
  • transactionsToUpdate list contains integers – Samarsh Jul 10 '13 at 05:33
  • Integer means 1 to 100,000 ?? is it fixed or you fill up from the database ? – Hiren Dhaduk Jul 10 '13 at 06:08
  • It is random. can be anything within the range of integer. I fill it from the database by running some business rules. – Samarsh Jul 10 '13 at 06:16
  • the problem is you are taking data from database then generate query from C# and then again run it into database . Instead you should directly make a query that run into database by applying your business rules . – Hiren Dhaduk Jul 10 '13 at 06:20
  • @HarshMaurya , I modified my answer . it will really help you . – Hiren Dhaduk Jul 10 '13 at 06:30
0

Assuming the transactionsToUpdate is a List of int,

Get a comma separated list here:

string queryPart = String.Join(",", transactionsToUpdate.ToArray());

Then pass it in the query like:

string query = @"UPDATE dbo.OutgoingQueue SET Status='C' WHERE TransactionID IN(" + queryPart + ")";

OR you can create a stored procedure that accepts a comma separated list of values and pass the queryPart to the stored procedure.

UPDATE:

You can then do this in bulk operation through .Net e.g:

int count = 0;
int bulkCount = 1000;

while (count < transactionsToUpdate.Count)
{
    string queryPart = String.Join(",", transactionsToUpdate.ToArray().Skip(count).Take(bulkCount));
    string query = @"UPDATE dbo.OutgoingQueue SET Status='C' WHERE TransactionID IN(" + queryPart + ")";

    //execute the sql here by doing the ExecuteNonQuery call.

    count += bulkCount;
}

This query will take the first 1000 from the list, process them, then take another 1000 until all of them are processed.

Azhar Khorasany
  • 2,712
  • 16
  • 20
0

The answer on the size of the update:

If you have small value of ids to be update then you can create batch statement and send it in single transaction.

update dbo.outgoingqueue out set status = 'C' where out.transactionID = %1;
update dbo.outgoingqueue out set status = 'C' where out.transactionID = %2;
update dbo.outgoingqueue out set status = 'C' where out.transactionID = %3;
update dbo.outgoingqueue out set status = 'C' where out.transactionID = %4;
update dbo.outgoingqueue out set status = 'C' where out.transactionID = %5;
update dbo.outgoingqueue out set status = 'C' where out.transactionID = %6;

If the ids are not so small to do it in single transaction you can create a temporary table and execute update query like this:

update dbo.outgoingqueue out set status = 'C' where 
 exists (select null from tmp_tab where tmp_tab.transactionID = out.transactionID);

If you plan to update all of the records, the best way is not to update at all.

You should create a new table with the new names using:

select <column list> into <table name> from <source>;

Then in select you set you set your new values and at the end you just rename the tables.

0

The tack I would take:

cmd.CommandText = "CREATE TABLE #Values( id  Int )";
cmd.ExecuteNonQuery();


foreach (int id in transactionsToUpdate.ToList())
{
    cmd.CommandText = "INSERT INTO #Values VALUES( " + id.ToString() + ");" 
    cmd.ExecuteNonQuery();
}

Now, rather than 100,000 IFs, you can test for the value by joining against this TEMP table, which is SQL's natural domain. If you'll be doing multiple tests against the same set of numbers, it might make sense to index the table after loading it.

Curt
  • 5,518
  • 1
  • 21
  • 35
0

One suggestion for your code

use StringBuilder instead of using String It speed up your process

StringBuilder queryPart = new StringBuilder("");
foreach (int id in transactionsToUpdate.ToList())
{
queryPart.Append("TransactionID=");
queryPart.Append(id);
queryPart.Append(" OR ");
}
queryPart.Append("1=0");
string query = @"UPDATE dbo.OutgoingQueue SET Status='C' WHERE "+queryPart.toString();

You shall always use StringBuileder if you are doing large manipulation

EDIT 1 You can check your execution performance using StopWatch class

This will show how quicker is StringBuilder Ever 100 to 1000 time faster than String

Trikaldarshiii
  • 11,174
  • 16
  • 67
  • 95