2

I want to insert around 3000 records, when I go by approach 1 it takes around 2 min to complete, however if i use approach 2 insert completes in less than second. Though approach 2 doesn't adhere to good practice but its giving me good performance gain. Would like to understand why approach 1 takes so much time and can there be a better way to do this

Approach 1:

public static void InsertModelValue(DataSet employeData, int clsaId)
{
    var query = @"INSERT INTO employee (id, name)
                  VALUES (@id, @name)";
    using (var connection = GetOdbcConnection())
    {                      
        connection.Open();                
        var tran = connection.BeginTransaction();
        try
        {                   

            foreach (DataRow row in employeData.Tables[0].Rows)
            {                       
                using (var cmd = new OdbcCommand(query, connection, tran))
                {
                    cmd.Parameters.Add("@id", OdbcType.VarChar).Value = row["ID"];
                    cmd.Parameters.Add("@name", OdbcType.Int).Value = Convert.ToInt32(row["Name"]);
                    cmd.ExecuteNonQuery();
                }
             }
            tran.Commit();
        }
        catch
        {
            tran.Rollback();
            throw;
        }                      
   }          
}

Approach 2:

public static void InsertModelValueInBulk(DataSet employeData, int clsaId, int batchSize)
{          
    string[] insertStatement = new string[batchSize];
    using (var connection = GetOdbcConnection())
    {
        connection.Open();
        var tran = connection.BeginTransaction();
        try
        {                               
            int j = 0;
            for (int i = 0; i < employeData.Tables[0].Rows.Count; i++)
            {
                var row = employeData.Tables[0].Rows[i];      
                var insertItem = string.Format(@"select '{0}',{1}", row["name"], Convert.ToInt32(row["ID"]);
                insertStatement[j] = insertItem;
                if (j % (batchSize-1) == 0 && j > 0)
                {
                    var finalQuery = @" INSERT INTO employee (id, name)
     " + String.Join(" union ", insertStatement);
                    using (var cmd = new OdbcCommand(finalQuery, connection, tran))
                    {
                        cmd.ExecuteNonQuery();
                    }
                    j = 0;
                    continue;
                }
                else
                {
                    j = j + 1;
                }
            }

            if (j > 0)
            {

                var finalQuery = @"INSERT INTO employee (id, name)
     " + String.Join(" union ", insertStatement,0,j-1);
                using (var cmd = new OdbcCommand(finalQuery, connection, tran))
                {
                    cmd.ExecuteNonQuery();
                }
            }

            tran.Commit();
        }
        catch
        {
            tran.Rollback();
            throw;
        }
    }
}
Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
Brijesh Mishra
  • 2,738
  • 1
  • 21
  • 36
  • 2
    The answer lies in the code itself. In the first approach you are making multiple calls from your application to SQL.. Where as 2 approach you make single call with a query. Run SQL Profiler and you will come to know the difference. – sajoshi Mar 01 '11 at 07:38
  • 4
    Possibly a stupid question, but have you looked at this: [Performing Batch Operations Using DataAdapters (ADO.NET)](http://msdn.microsoft.com/en-us/library/aadf8fk2.aspx). It's basically what you're doing in the second approach, just more correct. – Roman Mar 01 '11 at 19:16
  • Thanks romanarmy you answered it – Brijesh Mishra Mar 02 '11 at 03:39

1 Answers1

26

You want to deposit three thousand dollars in your bank account. Which is faster:

  • wait for a teller
  • take a dollar out of your wallet
  • show your id to the teller
  • deposit the dollar
  • go to the end of the line
  • repeat the whole process 2999 more times, then go home.

or

  • wait for a teller
  • take three thousand dollars out of your wallet
  • show your id to the teller
  • deposit the three thousand dollars
  • go home

?

It should be fairly obvious that the first one is a lot slower than the second one. Now is it clear why the first technique is hundreds of times slower than the second?

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • +1, great analogy. Btw, the three thousand dollars should be in one dollar bills :) – SWeko Mar 01 '11 at 07:49
  • But Eric, but what if I don't have 3,000 dollars? +1 BTW :) – Binary Worrier Mar 01 '11 at 08:04
  • But in many places have read approach 2 to be not a recommended solution – Brijesh Mishra Mar 01 '11 at 08:17
  • @Eric, you have truly big wallet to keep 3000 notes of a single dollar bill :) – bestsss Mar 01 '11 at 08:46
  • 4
    @Brijesh: OK, what did those many places you read have to say about *why* the fast approach is a bad idea? – Eric Lippert Mar 01 '11 at 14:46
  • I usually would prefer avoid approach 2 in favor of something specifically designed for bulk insertion such as `SqlBulkCopy`. See also [this answer](http://stackoverflow.com/questions/4849270/how-to-add-more-than-10000-parameters-to-a-single-sql-command/4849365#4849365) to a different question. – Brian Mar 02 '11 at 03:47