Why there is significant performance gain when doing multiple insert over single insert in a command

Question

I want to insert around 3000 records, when I go by approach 1 it takes around 2 min to complete, however if i use approach 2 insert completes in less than second. Though approach 2 doesn't adhere to good practice but its giving me good performance gain. Would like to understand why approach 1 takes so much time and can there be a better way to do this

Approach 1:

public static void InsertModelValue(DataSet employeData, int clsaId)
{
    var query = @"INSERT INTO employee (id, name)
                  VALUES (@id, @name)";
    using (var connection = GetOdbcConnection())
    {                      
        connection.Open();                
        var tran = connection.BeginTransaction();
        try
        {                   

            foreach (DataRow row in employeData.Tables[0].Rows)
            {                       
                using (var cmd = new OdbcCommand(query, connection, tran))
                {
                    cmd.Parameters.Add("@id", OdbcType.VarChar).Value = row["ID"];
                    cmd.Parameters.Add("@name", OdbcType.Int).Value = Convert.ToInt32(row["Name"]);
                    cmd.ExecuteNonQuery();
                }
             }
            tran.Commit();
        }
        catch
        {
            tran.Rollback();
            throw;
        }                      
   }          
}

Approach 2:

public static void InsertModelValueInBulk(DataSet employeData, int clsaId, int batchSize)
{          
    string[] insertStatement = new string[batchSize];
    using (var connection = GetOdbcConnection())
    {
        connection.Open();
        var tran = connection.BeginTransaction();
        try
        {                               
            int j = 0;
            for (int i = 0; i < employeData.Tables[0].Rows.Count; i++)
            {
                var row = employeData.Tables[0].Rows[i];      
                var insertItem = string.Format(@"select '{0}',{1}", row["name"], Convert.ToInt32(row["ID"]);
                insertStatement[j] = insertItem;
                if (j % (batchSize-1) == 0 && j > 0)
                {
                    var finalQuery = @" INSERT INTO employee (id, name)
     " + String.Join(" union ", insertStatement);
                    using (var cmd = new OdbcCommand(finalQuery, connection, tran))
                    {
                        cmd.ExecuteNonQuery();
                    }
                    j = 0;
                    continue;
                }
                else
                {
                    j = j + 1;
                }
            }

            if (j > 0)
            {

                var finalQuery = @"INSERT INTO employee (id, name)
     " + String.Join(" union ", insertStatement,0,j-1);
                using (var cmd = new OdbcCommand(finalQuery, connection, tran))
                {
                    cmd.ExecuteNonQuery();
                }
            }

            tran.Commit();
        }
        catch
        {
            tran.Rollback();
            throw;
        }
    }
}

The answer lies in the code itself. In the first approach you are making multiple calls from your application to SQL.. Where as 2 approach you make single call with a query. Run SQL Profiler and you will come to know the difference. — sajoshi, Mar 01 '11 at 07:38
Possibly a stupid question, but have you looked at this: [Performing Batch Operations Using DataAdapters (ADO.NET)](http://msdn.microsoft.com/en-us/library/aadf8fk2.aspx). It's basically what you're doing in the second approach, just more correct. — Roman, Mar 01 '11 at 19:16

score 26 · Accepted Answer · answered Mar 01 '11 at 07:38

26

You want to deposit three thousand dollars in your bank account. Which is faster:

wait for a teller
take a dollar out of your wallet
show your id to the teller
deposit the dollar
go to the end of the line
repeat the whole process 2999 more times, then go home.

or

wait for a teller
take three thousand dollars out of your wallet
show your id to the teller
deposit the three thousand dollars
go home

?

It should be fairly obvious that the first one is a lot slower than the second one. Now is it clear why the first technique is hundreds of times slower than the second?

answered Mar 01 '11 at 07:38

Eric Lippert

647,829
179
1,238
2,067

+1, great analogy. Btw, the three thousand dollars should be in one dollar bills :) – SWeko Mar 01 '11 at 07:49
But Eric, but what if I don't have 3,000 dollars? +1 BTW :) – Binary Worrier Mar 01 '11 at 08:04
But in many places have read approach 2 to be not a recommended solution – Brijesh Mishra Mar 01 '11 at 08:17
@Eric, you have truly big wallet to keep 3000 notes of a single dollar bill :) – bestsss Mar 01 '11 at 08:46
4

@Brijesh: OK, what did those many places you read have to say about *why* the fast approach is a bad idea? – Eric Lippert Mar 01 '11 at 14:46
I usually would prefer avoid approach 2 in favor of something specifically designed for bulk insertion such as `SqlBulkCopy`. See also [this answer](http://stackoverflow.com/questions/4849270/how-to-add-more-than-10000-parameters-to-a-single-sql-command/4849365#4849365) to a different question. – Brian Mar 02 '11 at 03:47

Why there is significant performance gain when doing multiple insert over single insert in a command

1 Answers1