How to create custom class w/ IDataReader from LumenWorks (Fast CSV Reader) to use SqlBulkCopy while having several different file versions?

Question

I have decided to replace/convert DataTable to IDataReader due to memory issues.

After quite a while of Google & MSDN searches, I came across this at http://www.michaelbowersox.com/2011/12/22/using-a-custom-idatareader-to-stream-data-into-a-database/ and Bulk Insert Sql Server millions of record .

Since I'm using LumenWorks Fast CSV Reader and I haven't yet figure out how tell the CsvReader to have IDataReader to use 2 different field versions. :-( The csvReader.FieldCount is the key here but I don't see how to tell CsvReader to use either of the 2 new classes having IDataReader interface. See the original script and modified script below... Thanks...

//Original scripts...

var dbConnection = new SqlConnection(_dbConnectionString);

using (var dbBulkCopy = new SqlBulkCopy(dbConnection)
{
   using (CsvReader csvReader = new CsvReader(new StreamReader(filePath), false, '|', '"', '\\', '#', ValueTrimmingOptions.UnquoteOnly))
   {
       while(csvReader.ReadNextRecord())
       {
           if (csvReader.FieldCount == 48)
           {
               //Version 1...
               dataRow["DealerId"] = csvReader[0];
               dataRow["DealerName"] = csvReader[1];
               //Etc...
           }
           else if (csvReader.FieldCount == 51)
           {
               //Version 2...
               dataRow["DealerId"] = csvReader[0];
               dataRow["DealerName"] = csvReader[1];
               //Etc...
           }
           else { throw new Exception("Field Total Count Mismatched"); }

           dataTable.Rows.Add(dataRow);
       }

       dbConnection.Open();

       dbBulkCopy.WriteToServer(dataTable);
   }
}

//New script...

 var dbConnection = new SqlConnection(_dbConnectionString);

using (var dbBulkCopy = new SqlBulkCopy(dbConnection)
{
   dbConnection.Open();

   using (CsvReader csvReader = new CsvReader(new StreamReader(filePath), false, '|', '"', '\\', '#', ValueTrimmingOptions.UnquoteOnly))
   {
       csvReader.ReadNextRecord();

       dbBulkCopy.WriteToServer(
           if (csvReader.FieldCount == 48)
           {
               //Version 1...

               csvReader....???  //Assign a custom class having IDataTable...
           }
           else if (csvReader.FieldCount == 51)
           {
               //Version 2...
               csvReader....???  //Assign a custom class having IDataTable...
           }
           else { throw new Exception("Field Total Count Mismatched"); }
        );
   }
}

//Sample Script...

using (var file = new StreamReader(path))
using (var csv = new CsvReader(file, true)) // true = has header row
using (var bcp = new SqlBulkCopy(connection)) {
    bcp.DestinationTableName = "TableName";
    bcp.WriteToServer(csv);
}

I would suggest to take look @ [this post](http://stackoverflow.com/questions/7791563/load-very-big-csv-files-into-s-sql-server-database) . Reading csv to data tables and then inserting them can be painful. And for your specific case, I once had a similar issue. The csv file I was receiving wasn't formatted uniformly. So I added a step 0 before all other treatments and this step was consisting to remove badly formatted lines and/or reformat them properly[You can even split the file to create 2 : one for 48 fields the other one 51 fields]. Then using sqlbulkcopy you can easily transfer data. — Mechanical Object, Jul 24 '13 at 00:19
I added the "Sample Script" codes from the provided web link. I'm not sure what is step 0 you're referring to. Let's say I have 23 files in a folder and each one of them is run one at a time in a loop. 13 files have 48 fields and 10 files have 51 fields on all rows. The database table have 51 fields. I would have no problem w/ 51 fields flat file but the 48 fields flat file, I don't see how to tell IDataReader class to use default values for the 4 missing fields. — fletchsod, Jul 24 '13 at 12:29
I wrote an answer since the comments wouldn't offer sufficient place. — Mechanical Object, Jul 28 '13 at 20:27

Mechanical Object · Answer 1 · 2013-07-24T21:19:47.977

Since, it'll be a bit long, I am writing it as an answer.

*I am assuming that despite the fact that you have two kinds of csv file coming with different field orders, the target table is the same.* [EDIT] No need to assume , you indicated in your comment.

To be able to to understand your context, I got some sample data from here.

Let's say that first type of file is something like the following :

Rk,Year,Age,Tm,Lg,Id,G,W,L,W-L,Finish,
1,1978,37,Atlanta Braves,NL,,162,69,93,.426,6

And the second type like (some columns are reversed Age <-> Finish and there are additional fields)

Rk,Year,Finish,Tm,Lg,Id,G,W,L,W-L,Age,Test1,Test2,Test3
1,1978,Very good year,Atlanta Braves,NL,,162,69,93,.426,96,,,,

So the target table would look something like (only columns)

Rk,Year,Finish,Tm,Lg,Id,G,W,L,W-L,Age,Test1,Test2,Test3

I see two options (+1 option at the end) here :

OPTION 1

Add Step 0 to make all the input files uniform on field level by defining a field format. This can be done by creating the same fields that you have on the database.

[Let's imagine that Test4 and Test5 are columns that exists on target table but missing in both CSV files]

Rk,Year,Finish,Tm,Lg,Id,G,W,L,W-L,Age,Test1,Test2,Test3,Test4,Test5

Parse all the files you have(both types) rewrite them onto one(or multiple that's up to you) respecting the format you defined. This way, you have only 1 (or multiple) file with a unique format.
You can parse now this file to insert it into database using csv reader since field incompatibility problem is handled with the last file you obtained in unique format.

OPTION 2

You will do SqlBulkCopy operation twice. First round you'll read files having 48 fields and next round you'll read files having 51 fields.

            var FilesWith48Fields = Directory.GetFiles(@"D:\Data\48Fields", "*.csv");

            foreach (var fileName in FilesWith48Fields)
            {
                using (var file = new StreamReader(fileName))
                using (var csv = new CsvReader(file, true)) // true = has header row
                using (var bcp = new SqlBulkCopy(connection, SqlBulkCopyOptions.KeepNulls))
                {
                    bcp.DestinationTableName = "fletchsodTable";
                    // map the first field of your CSV to the column in Database
                    var mapping1 = new SqlBulkCopyColumnMapping(0, "FirstColumnName");
                    bcp.ColumnMappings.Add(mapping1);

                    var mapping2 = new SqlBulkCopyColumnMapping(1, "SecondColumnName");
                    bcp.ColumnMappings.Add(mapping2);  
                    ....

                    bcp.WriteToServer(csv);
                }
            }

And Repeat the same with files having 51 fields

var FilesWith51Fields = Directory.GetFiles(@"D:\Data\51Fields", "*.csv");
......

More information on SqlBulkCopyColumnMapping can be found here.

OPTION 3

In case you want to adventure in creating your Data Reader, here are some links :

Daniel Wertheim's blog

A sample implentation on codeproject

Another one

And finally MSDN

PERSONAL NOTE Due lack of time, for a similar problem of mine I gave up 3rd option because with all the unit tests and optimization that you'll have to do and another tweaks, this can take time(at least for me, it was the case)

OPTION 4 Perhaps, with the column mapping I pointed in OPTION 2 , you would want to implement your way by testing field counts. But instinctively, I would suggest not to count fields by hard coded integers.

How to create custom class w/ IDataReader from LumenWorks (Fast CSV Reader) to use SqlBulkCopy while having several different file versions?

1 Answers1