0

I am reading from a csv file and sending data as table variable to a stored procedure. From what i have tested so far , I am able to process 300k records in 3 mins 30 seconds . The file may contain up to millions of records as we go. I wanted to know if its a good idea to send all these records to the stored procedure in one go or Should I send them in batches of say 500k? I have set the command timeout to 1800.

user1110790
  • 787
  • 2
  • 8
  • 27
  • 1
    Do you mean table variable or table-valued parameter? – Aaron Bertrand Sep 17 '12 at 21:35
  • Yes you are right I am using a TVP to send the data to the stored procedure . – user1110790 Sep 17 '12 at 21:47
  • I thought @MartinSmith posted some information about thresholds for optimal number of rows to pass into a TVP, but I'm having a hard time finding his post now. – Aaron Bertrand Sep 17 '12 at 22:13
  • I use TVP to load data don't think you need to break it up. If you trace it - it seems to process one at a time. How are you implementing? Are you using a class that implements IEnumerable? I found that faster DataTable and less memory. – paparazzo Sep 17 '12 at 22:13
  • @AaronBertrand If you are referring to this it was Insert Values. http://stackoverflow.com/questions/8635818/multiple-insert-statements-vs-single-insert-with-multiple-values – paparazzo Sep 17 '12 at 22:33
  • I thought there was another but I may be remembering wrong. – Aaron Bertrand Sep 17 '12 at 22:34
  • In SQl profiler , it shows a row by row insert. I am using datatable. i had problmes implementing what you have suggested – user1110790 Sep 18 '12 at 00:44

1 Answers1

1

An example of using IEnumerable SqlDataRecord
It works kind of like a reverse datareader

Notice I sort. This is by the clustered index. Fragmentation of the indexes will absolutely kill load speed. The first implementation used Insert Values (unsorted) and in a 12 hour run this version is literally 100x faster. I also disable indexes other than the PK and reindex at the end of the load. In a long run I am getting about 500 rows / second. Your sample is 1400 / second so great. If you start to see degradation then things to look at.

public class DocFTSinXsCollection : List<DocFTSinX>, IEnumerable<SqlDataRecord>
{
    // used by TVP for fast insert
    private int sID;
    private IEnumerable<DocFTSinX> docFTSinXs;
    IEnumerator<SqlDataRecord> IEnumerable<SqlDataRecord>.GetEnumerator()
    {
        //todo fix the order in 3 to sID, wordID1, workID2
        var sdr = new SqlDataRecord(
        new SqlMetaData("wordID1", System.Data.SqlDbType.Int),
        new SqlMetaData("wordID2", System.Data.SqlDbType.Int),
        new SqlMetaData("sID", System.Data.SqlDbType.Int),
        new SqlMetaData("Delta", System.Data.SqlDbType.Int));
        foreach (DocFTSinX oh in docFTSinXs.OrderBy(x => x.Word1).ThenBy(x => x.Word2))
        {
            sdr.SetInt32(0, oh.Word1);
            sdr.SetInt32(1, oh.Word2);
            sdr.SetInt32(2, sID);
            sdr.SetInt32(3, (Int32)oh.Delta);
            yield return sdr;
        }
    }

    public DocFTSinXsCollection(int SID, IEnumerable<DocFTSinX> DocFTSinXs)
    {
        sID = SID;
        docFTSinXs = DocFTSinXs;
        //Debug.WriteLine("DocFTSinXsCollection DocFTSinXs " + DocFTSinXs.Count().ToString());
    }
}

Other tools to consider are the SQLBulkCopy .NET class and Drapper.

OP asked how to perform in batches.

 while (true)
 {
     // if no more break;
     // fill list or datatable with next 100000
     // send list or datatable to db
 }
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • Thank you Blam, I actually tested this ,and it was very efficient . But the DBA and management wants me to send 100k records per request. Do you know how we can handle that ? I mean send 100K requests to the datatable or how ? – user1110790 Sep 20 '12 at 14:20
  • 1
    See update at end of question. Request is not the correct term. It is an insert. A TVP implementation is going to send the insert one record at a time regardless of the batch size. You already stated you saw that in the SQL profiler. – paparazzo Sep 20 '12 at 14:44
  • yes,@Blam Thank you for correcting me. I am new to this and appreciate your help. So the question would be in C# is there a way I can send 100k inserts to the TVP ? – user1110790 Sep 20 '12 at 15:13
  • I ended up using the method suggested as below :http://stackoverflow.com/questions/769373/whats-a-clean-way-to-break-up-a-datatable-into-chunks-of-a-fixed-size-with-linq – user1110790 Sep 24 '12 at 16:51