1

Got about a 400 MB .txt file here that is delimited by '|'. Using a Windows Form with C#, I'm inserting each row of the .txt file into a table in my SQL server database.

What I'm doing is simply this (shortened by "..." for brevity):

while ((line = file.ReadLine()) != null)
{
   string[] split = line.Split(new Char[] { '|' });

   SqlCommand cmd = new SqlCommand("INSERT INTO NEW_AnnualData VALUES (@YR1984, @YR1985, ..., @YR2012)", myconn);
   cmd.Parameters.AddWithValue("@YR1984", split[0]);
   cmd.Parameters.AddWithValue("@YR1985", split[1]);
   ...
   cmd.Parameters.AddWithValue("@YR2012", split[28]);

   cmd.ExecuteNonQuery();
}

Now, this is working, but it is taking awhile. This is my first time to do anything with a huge amount of data, so I need to make sure that A) I'm doing this in an efficient manner, and that B) my expectations aren't too high.

Using a SELECT COUNT() while the loop is going, I can watch the number go up and up over time. So I used a clock and some basic math to figure out the speed that things are working. In 60 seconds, there were 73881 inserts. That's 1231 inserts per second. The question is, is this an average speed, or am I getting poor performance? If the latter, what can I do to improve the performance?

I did read something about SSIS being efficient for this purpose exactly. However, I need this action to come from clicking a button in a Windows Form, not going through SISS.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
CptSupermrkt
  • 6,844
  • 12
  • 56
  • 87

4 Answers4

4

Oooh - that approach is going to give you appalling performance. Try using BULK INSERT, as follows:

BULK INSERT MyTable
    FROM 'e:\orders\lineitem.tbl'
    WITH 
    (
        FIELDTERMINATOR ='|',
        ROWTERMINATOR ='\n'
    )

This is the best solution in terms of performance. There is a drawback, in that the file must be present on the database server. There are two workarounds for this that I've used in the past, if you don't access to the server's file system from where you're running the process. One is to install an instance of SQL Express on the workstation, add the main server as a linked server to the workstation instance, and then run "BULK INSERT MyServer.MyDatabase.dbo.MyTable...". The other option is to reformat the CSV file as XML, which can be processed very quickly, and then passing the XML to query and processing it using OPENXML. Both BULK INSERT and OPENXML are well documented on MSDN, and you'd do well to read through the examples.

Peter
  • 1,055
  • 6
  • 8
  • This simple method is very good. For the same file where Import Wizard was loading 100,000 records per minute, I have instead loaded 6.5 million records in 2 minutes: 40 times faster. – erp_da Feb 05 '21 at 20:27
2

Have a look at SqlBulkCopy on MSDN, or the nice blog post here. For me that goes up to tens of thousands of inserts per second.

Andomar
  • 232,371
  • 49
  • 380
  • 404
  • This was exactly what I needed. I had a problem using it at first because there was so much data that I was getting OutOfMemory exceptions, but this post put me on the right path: http://stackoverflow.com/questions/9442171/a-way-out-from-getting-systemoutofmemoryexception-while-importing-from-large-tex – CptSupermrkt Mar 07 '12 at 00:27
2

I'd have to agree with Andomar. I really quite like SqlBulkCopy. It is really fast (you need to play around with BatchSizes to make sure you find one that suits your situation.)

For a really in depth article discussing the various options, check out Microsoft's "Data Loading Performance Guide"; http://msdn.microsoft.com/en-us/library/dd425070(v=sql.100).aspx

Also, take a look at the C# example with SqlBulkCopy of CSV Reader. It isn't free, but if you can write a fast and accurate parser in less time, then go for it. At least, it'll give you some ideas.

Community
  • 1
  • 1
Mr Moose
  • 5,946
  • 7
  • 34
  • 69
1

I have fonud SSIS to be much faster than this type of method but there are a bunch of variables that can affect performence.

If you want to experiment with SSIS, use the Import and Export wizard in Management Studio to generate a SSIS package that will import a pipe delimited file. You can save out the package and run it from a .NET application

See this article: http://blogs.msdn.com/b/michen/archive/2007/03/22/running-ssis-package-programmatically.aspx for info on how to run an SSIS package programatically. It includes options on how to run from the client, from the server, or wherever.

Also, take a look at this article for additional ways you can improve bulk insert performance in general. http://msdn.microsoft.com/en-us/library/ms190421.aspx

Code Magician
  • 23,217
  • 7
  • 60
  • 77
  • Wouldn't this require client deployment of SSIS components? And if you upgrade your SQL Server versions, SSIS packages usually require an update too. – Andomar Mar 06 '12 at 07:04
  • The link above details multiple methods for executing an SSIS package from code, 3 of which can be run remotely including options without deploying client tools. Also, I have not needed to make any changes to my SSIS packages going from 05 to 08 to R2. YMMV. – Code Magician Mar 06 '12 at 07:13