7

I'm looking for the fastest way to load bulk data via c#. I have this script that does the job but slow. I read testimonies that SqlBulkCopy is the fastest.
1000 records 2.5 seconds. files contain anywhere near 5000 records to 250k What are some of the things that can slow it down?

Table Def:

CREATE TABLE [dbo].[tempDispositions](
    [QuotaGroup] [varchar](100) NULL,
    [Country] [varchar](50) NULL,
    [ServiceGroup] [varchar](50) NULL,
    [Language] [varchar](50) NULL,
    [ContactChannel] [varchar](10) NULL,
    [TrackingID] [varchar](20) NULL,
    [CaseClosedDate] [varchar](25) NULL,
    [MSFTRep] [varchar](50) NULL,
    [CustEmail] [varchar](100) NULL,
    [CustPhone] [varchar](100) NULL,
    [CustomerName] [nvarchar](100) NULL,
    [ProductFamily] [varchar](35) NULL,
    [ProductSubType] [varchar](255) NULL,
    [CandidateReceivedDate] [varchar](25) NULL,
    [SurveyMode] [varchar](1) NULL,
    [SurveyWaveStartDate] [varchar](25) NULL,
    [SurveyInvitationDate] [varchar](25) NULL,
    [SurveyReminderDate] [varchar](25) NULL,
    [SurveyCompleteDate] [varchar](25) NULL,
    [OptOutDate] [varchar](25) NULL,
    [SurveyWaveEndDate] [varchar](25) NULL,
    [DispositionCode] [varchar](5) NULL,
    [SurveyName] [varchar](20) NULL,
    [SurveyVendor] [varchar](20) NULL,
    [BusinessUnitName] [varchar](25) NULL,
    [UploadId] [int] NULL,
    [LineNumber] [int] NULL,
    [BusinessUnitSubgroup] [varchar](25) NULL,
    [FileDate] [datetime] NULL
) ON [PRIMARY]

and here's the code

    private void BulkLoadContent(DataTable dt)
    {
        OnMessage("Bulk loading records to temp table");
        OnSubMessage("Bulk Load Started");
        using (SqlBulkCopy bcp = new SqlBulkCopy(conn))
        {
            bcp.DestinationTableName = "dbo.tempDispositions";
            bcp.BulkCopyTimeout = 0;
            foreach (DataColumn dc in dt.Columns)
            {
                bcp.ColumnMappings.Add(dc.ColumnName, dc.ColumnName);
            }
            bcp.NotifyAfter = 2000;
            bcp.SqlRowsCopied += new SqlRowsCopiedEventHandler(bcp_SqlRowsCopied);
            bcp.WriteToServer(dt);
            bcp.Close();
        }
    }
Chris Hayes
  • 3,876
  • 7
  • 42
  • 72
  • 1
    Slow as compared to what? How many records? What other approaches have you tried? Is this "slow" as in "gee, I better go grab some coffee" or as in "I could type it in by hand faster than this"? – GalacticCowboy Apr 22 '10 at 16:38
  • Slow as in "I better go grab some coffee" 1000 records 2.5 seconds. files contain anywhere near 5000 records to 250k – Chris Hayes Apr 22 '10 at 16:40
  • Does your db have a `LOAD DATA INFILE` function and can you call it directly? – dnagirl Apr 22 '10 at 16:49
  • 3
    FYI: I deployed the solution 'as-is' and it flew. It was off the charts. I guess, while in debug mode, there's other plumming going on that impact performance. – Chris Hayes Apr 22 '10 at 23:27

7 Answers7

8

Do you have any indexes, triggers or constraints on that table?

That will cause slowdowns on insert - especially a clustered index would hurt. When blasting the amounts of data you're doing, it's best to drop indexes first, and re-apply them afterwards.

A good post about it is here: What's the fastest way to bulk insert a lot of data in SQL Server (C# client)

Community
  • 1
  • 1
Frederik
  • 2,921
  • 1
  • 17
  • 26
4

If you have lots of data, setting the batchsize to a reasonably large number might help:

bcp.BatchSize = 10000;
zszep
  • 4,450
  • 4
  • 38
  • 58
1

Things that can slow down the bulk copy : -Full text indexes on the table -Triggers on Insert -Foreign-Key constraints

Jipy
  • 115
  • 5
1

I've noticed that trying to flush large datasets is initially much faster, but slows down substantially over time. I've found a modest increase in performance using a buffered approach, feeding bulkcopy just a few thousand records at a time under the same connection. It seems to keep the per-batch transaction time down over time, which (over time), improves performance. On my solution, I've noted that the same method un-buffered will save about 5,000,000 records in the time it takes this method to save about 7,500,000 records of the same type to the same DB. Hope this helps someone.

public void flush_DataTable(DataTable dt, string tableName)//my incoming DTs have a million or so each and slow down over time to nothing. This helps.
    {  int bufferSize = 10000;
        int bufferHigh = bufferSize;
        int lowBuffer = 0;
        if (dt.Rows.Count >= bufferSize)
        {  using (SqlConnection conn = getConn())
            {   conn.Open();
                while (bufferHigh < dt.Rows.Count)
                {
                    using (SqlBulkCopy s = new SqlBulkCopy(conn))
                    {   s.BulkCopyTimeout = 900;
                        s.DestinationTableName = tableName;
                        s.BatchSize = bufferSize;

                        s.EnableStreaming = true;
                        foreach (var column in dt.Columns)
                            s.ColumnMappings.Add(column.ToString(), column.ToString());
                        DataTable bufferedTable = dt.Clone();
                        for (int bu = lowBuffer; bu < bufferHigh; bu++)
                        {
                            bufferedTable.ImportRow(dt.Rows[bu]);
                        }
                        s.WriteToServer(bufferedTable);
                        if (bufferHigh == dt.Rows.Count)
                        {
                            break;
                        }
                        lowBuffer = bufferHigh;
                        bufferHigh += bufferSize;

                        if (bufferHigh > dt.Rows.Count)
                        {
                            bufferHigh = dt.Rows.Count;
                        }
                    }
                }
                conn.Close();
            }
        }
        else
        {
            flushDataTable(dt, tableName);//perofrm a non-buffered flush (could just as easily flush the buffer here bu I already had the other method 
        }
    }
Shannon Holsinger
  • 2,293
  • 1
  • 15
  • 21
  • You could maybe speed things up using a Parallel.For for the table copy and/or taking advantage of threading by allowing 2-3 parallel inserts using await s.WriteToServerAsync. You might also try putting the loop logic (WHILE) inside the Using(s) and after column mapping so it all happens under the same instance of SQLBulkCopy -- I was happy enough with the 50% increase in performance over time not to try some of these other tricks. YMMV – Shannon Holsinger Aug 08 '16 at 11:47
  • 1
    Also - this is slower when a table is empty - the performance gains seem to come on as the table fills. For a few million records, I doubt this would be of much use. But for hundreds of millions, it's saved me a bunch of time. – Shannon Holsinger Aug 08 '16 at 12:18
1

Setting BatchSize or NotifyAfter will both cause a slow down.

If you have a data reader, you can wrap it with a row-counting data reader to get the progress.

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
{
    bulkCopy.EnableStreaming = true;
    bulkCopy.BulkCopyTimeout = 0;
    using (var reader = new MyRowCountDataReader(dataReader))
    {
        reader.NotifyAfter = 1000;
        reader.RowRead += (sender, args) => { ... };
        bulkCopy.WriteToServer(reader);
    }
}
John Gietzen
  • 48,783
  • 32
  • 145
  • 190
1

I just had a similar experience. SqlBulkCopy was fine with one table, but with another it was at least 10x slower to copy the same number of records.

The slow table did have non-clustered indexes and a trigger, but disabling them didn't make any appreciable difference.

It turns out the slow table had six NVARCHAR(MAX) columns. The data I was actually inserting was all fairly short, so I changed the columns to NVARCHAR(500).

Et voilà! The performance of the slow table improved to match the other table - at least a 10x speed improvement.

Paul Suart
  • 6,505
  • 7
  • 44
  • 65
  • I had the same experience like you with SQL Server 2019. It sometimes took minutes to store a single row in a table with one NVARCHAR(MAX) column. The strange thing is, it only happened sporadically for no reason whatsoever. Changing the column to NVARCHAR(256) fixed this issue for me. – ZuBsPaCe Dec 02 '22 at 16:25
0

The IDataReader implementation I sugested here How to implement IDataReader? maybe helps you. I used it with SqlBulkCopy as follows:

using (MyFileDataReader reader = new MyFileDataReader(@"C:\myfile.txt"))
 {
      SqlBulkCopy bulkCopy = new SqlBulkCopy(connection);
      bulkCopy.DestinationTableName = "[my_table]";
      bulkCopy.BatchSize = 10000;

      bulkCopy.WriteToServer(reader);

      bulkCopy.Close();

 } 
Community
  • 1
  • 1
playful
  • 1,744
  • 15
  • 14