1

I have a piece of C# code, which updates two specific columns for ~1000x20 records in a database on the localhost. As I know (though I am really far from being a database expert), it should not take long, but it takes more than 5 minutes.

I tried SQL Transactions, with no luck. SqlBulkCopy seems a bit overkill, since it's a large table with dozens of columns, and I only have to update 1/2 column for a set of records, so I would like to keep it simple. Is there a better approach to improve efficiency?

The code itself:

public static bool UpdatePlayers(List<Match> matches)
    {
        using (var connection = new SqlConnection(Database.myConnectionString))
        {
            connection.Open();
            SqlCommand cmd = connection.CreateCommand();

            foreach (Match m in matches)
            {
                cmd.CommandText = "";
                foreach (Player p in m.Players)
                {
                    // Some player specific calculation, which takes almost no time.
                    p.Morale = SomeSpecificCalculationWhichMilisecond();
                    p.Condition = SomeSpecificCalculationWhichMilisecond();


                    cmd.CommandText += "UPDATE [Players] SET [Morale] = @morale, [Condition] = @condition WHERE [ID] = @id;";
                    cmd.Parameters.AddWithValue("@morale", p.Morale);
                    cmd.Parameters.AddWithValue("@condition", p.Condition);
                    cmd.Parameters.AddWithValue("@id", p.ID);
                }
                cmd.ExecuteNonQuery();
            }
        }
        return true;
    }
LarsTech
  • 80,625
  • 14
  • 153
  • 225
dodo
  • 459
  • 2
  • 8
  • 21
  • 1
    Which dbms are you using? – jarlh Jun 17 '21 at 22:18
  • 4
    Looks like you are adding thousands of parameters and not clearing out that CommandText, since you keep appending to it. – LarsTech Jun 17 '21 at 22:19
  • 2
    You're updating n rows 1 at a time in a loop. If performance is your goal you should consider a single update in a set-based manner, which on 20,000 rows (not records) would take maybe a few seconds. – Stu Jun 17 '21 at 22:21
  • @jarlh I am using MSSQL. – dodo Jun 17 '21 at 22:29
  • @LarsTech actually I only do cleaning up the text at the top foreach loop. I am only adding 10-20 parameters in the internal foreach loop. – dodo Jun 17 '21 at 22:30
  • Does this answer your question? [Insert 2 million rows into SQL Server quickly](https://stackoverflow.com/questions/13722014/insert-2-million-rows-into-sql-server-quickly) –  Jun 17 '21 at 22:31
  • 1
    You are never clearing out the parameter collection, so you keep adding to it. – LarsTech Jun 17 '21 at 22:35
  • @LarsTech Oh, oh, I see it now, you are right. That's can be a significant performance problem? – dodo Jun 17 '21 at 22:37
  • The `cmd.CommandText += ...` should probably be changed to `cmd.CommandText = ...` and move the `cmd.ExecuteNonQuery();` line inside the for-each block. – LarsTech Jun 17 '21 at 22:39
  • 1
    Saving twenty-thousand records one at a time like this is going to be slow no matter how you do it. – Robert Harvey Jun 17 '21 at 22:39
  • @MickyD Thank you, probably I wasn't really understood the concept of the SqlBulkCopy then. I will try to figure out how to implement it in my case and get back some results. – dodo Jun 17 '21 at 22:39
  • @dodo that's quite ok. Eventually we all reach a point when conventional code for _mass inserts_ no longer scales and so `SqlBulkCopy` is the only option. [This SO page](https://stackoverflow.com/questions/12187768/how-does-sqlbulkcopy-work) has an excellent description on how it works and why it is so fast –  Jun 17 '21 at 22:46
  • yes I deleted my earlier comment re "[SqlBulkCopy is the only way]" after I re-visited your code. I think **@LarsTech makes some very good points** - it _might be just a simple bug as described above_. So its entirely possible that `SqlBulkCopy` isn't required in this case. –  Jun 17 '21 at 23:34
  • For proper help on SQL performance issues, you need to provide us the table and index definitions, and share the query plan via https://brentozar.com/pastetheplan. You may want to investigate either a temp table with `SqlBulkCopy`, or a Table Valued Parameter – Charlieface Jun 17 '21 at 23:52
  • 2
    You code is running each update with a commit, which is the default transactional handling for .NET SQL libraries, so there will definitely be a delay during each UPDATE being commit. I would recommend running groups of records within a transaction which would be faster. – Andrew Halil Jun 18 '21 at 00:32
  • When the CommandText is being appended and not cleared that would cause the SQL query parser and optimizer to take longer processing each command that is executed within the loop. As @LarsTech pointed out, clear the command text and the parameter list at the top of the inner loop block. – Andrew Halil Jun 18 '21 at 00:48
  • 1
    "*I tried SQL Transactions, with no luck.*" => I have serious doubts about this: did you really do `BeginTransaction; ForEach DoCommand; Commit; Catch Rollback; `? Or did you do one transaction per iteration? 20000 insert or update in one transaction is done in very few seconds, less than one or two, or three at worst, under SQLite. So with SQL Server... –  Jun 18 '21 at 04:51
  • 1
    Does this answer your question? [how to use sqltransaction in c#](https://stackoverflow.com/questions/19165291/how-to-use-sqltransaction-in-c-sharp) and [UPDATE faster + BEGIN TRANSACTION](https://stackoverflow.com/questions/24444838/update-faster-in-sqlite-begin-transaction) –  Jun 18 '21 at 04:57
  • 1
    And don't use [addwithvalue](http://www.dbdelta.com/addwithvalue-is-evil/) – SMor Jun 18 '21 at 12:06
  • We don't put solutions into the questions around here. Use the answer box if you want to add your own answer. – LarsTech Jun 18 '21 at 19:09

2 Answers2

0

Updating 20,000 records one at a time is a slow process, so taking over 5 minutes is to be expected.

From your query, I would suggest putting the data into a temp table, then joining the temp table to the update. This way it only has to scan the table to update once, and update all values.

Note: it could still take a while to do the update if you have indexes on the fields you are updating and/or there is a large amount of data in the table.

Example update query:

    UPDATE P
    SET [Morale] = TT.[Morale], [Condition] = TT.[Condition]
    FROM        [Players] AS P
    INNER JOIN  #TempTable AS TT ON TT.[ID] = P.[ID];

Populating the temp table

How to get the data into the temp table is up to you. I suspect you could use SqlBulkCopy but you might have to put it into an actual table, then delete the table once you are done.

If possible, I recommend putting a Primary Key on the ID column in the temp table. This may speed up the update process by making it faster to find the related ID in the temp table.

Trisped
  • 5,705
  • 2
  • 45
  • 58
  • Ignore last comment, I just read your last paragraph. +1 –  Jun 17 '21 at 23:32
  • Hope you don't mind but I made some minor formatting changes –  Jun 17 '21 at 23:38
  • @MickyD Saw that. Looks like it makes it easier to read, so I am fine with it. dodo used a period for the thousands separator, which is why I left off the comma. It should still be readable though, especially since most SO user use the comma as the thousands separator. Calling it out here just in case. – Trisped Jun 17 '21 at 23:40
0

Minor improvements;

  • use a string builder for the command text
  • ensure your parameter names are actually unique
  • clear your parameters for the next use
  • depending on how many players in each match, batch N commands together rather than 1 match.

Bigger improvement;

  • use a table value as a parameter and a merge sql statement. Which should look something like this (untested);
CREATE TYPE [MoraleUpdate] AS TABLE (
    [Id] ...,
    [Condition] ...,
    [Morale] ...
)
GO
MERGE [dbo].[Players] AS [Target]
USING @Updates AS [Source]
ON [Target].[Id] = [Source].[Id]

WHEN MATCHED THEN
    UPDATE SET SET [Morale] = [Source].[Morale],
        [Condition]  = [Source].[Condition]
DataTable dt = new DataTable();
dt.Columns.Add("Id", typeof(...));
dt.Columns.Add("Morale", typeof(...));
dt.Columns.Add("Condition", typeof(...));

foreach(...){
    dt.Rows.Add(p.Id, p.Morale, p.Condition);
}

SqlParameter sqlParam = cmd.Parameters.AddWithValue("@Updates", dt);
sqlParam.SqlDbType = SqlDbType.Structured;
sqlParam.TypeName = "dbo.[MoraleUpdate]";

cmd.ExecuteNonQuery();

You could also implement a DbDatareader to stream the values to the server while you are calculating them.

Jeremy Lakeman
  • 9,515
  • 25
  • 29