0

I am using C# to parse a csv file and export to a SQL Server database table. The schema of the database table is almost identical to that of the csv file, with the exception that the table has a Primary Key Identity column as the first column.

The problem: the 2nd column of the database table, which should receive the 1st column of the csv file, is actually receiving the 2nd column of the csv file. The code is assuming that first PK Identity column of the database table is the first column to be written to from the CSV file. In case this is confusing, assume column 1, 2, and 3 of the CSV file have headers called Contoso1, Contoso2 and Contoso3, respectively. The database table's columns 1 through 4 are called Id, Contoso1, Contoso2, and Contoso3, respectively. During the export, the Id column correctly gets populated with the identity id, but then the Contoso1 column of the database table gets populated with the Contoso2 column of the CSV file, and that being off by one column continues on for all 300 columns.

Here is the code. I'm looking for a way to do a one-column offset with this code. If possible, I'd like to avoid hardcoding a mapping scheme as there are 300+ columns.

using System;
using System.Data.SqlClient;
using System.Data;
using Microsoft.VisualBasic.FileIO;

namespace CSVTest
{
    class Program
    {
        static void Main(string[] args)
        {

            string csv_file_path = @"pathToCsvFile";
            DataTable csvData = GetDataTabletFromCSVFile(csv_file_path);
            Console.WriteLine("Rows count:" + csvData.Rows.Count);
            InsertDataIntoSQLServerUsingSQLBulkCopy(csvData);
        }



        private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
        {
            DataTable csvData = new DataTable();
            try
            {
                using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
                {
                    csvReader.SetDelimiters(new string[] { "," });
                    csvReader.HasFieldsEnclosedInQuotes = true;
                    string[] colFields = csvReader.ReadFields();
                    foreach (string column in colFields)
                    {
                        DataColumn datecolumn = new DataColumn(column);
                        datecolumn.AllowDBNull = true;
                        csvData.Columns.Add(datecolumn);
                    }
                    while (!csvReader.EndOfData)
                    {
                        string[] fieldData = csvReader.ReadFields();
                        //Making empty value as null
                        for (int i = 0; i < fieldData.Length; i++)
                        {
                            if (fieldData[i] == "")
                            {
                                fieldData[i] = null;
                            }
                        }
                        csvData.Rows.Add(fieldData);
                    }
                }
            }
            catch (Exception ex)
            {
                return null;
            }
            return csvData;
        }


        static void InsertDataIntoSQLServerUsingSQLBulkCopy(DataTable csvFileData)
        {
            using (SqlConnection dbConnection = new SqlConnection("Data Source=localhost;Initial Catalog=Database_Name;Integrated Security=SSPI;"))
            {
                dbConnection.Open();
                using (SqlBulkCopy s = new SqlBulkCopy(dbConnection))
                {
                    s.DestinationTableName = "TableName";
                    //foreach (var column in csvFileData.Columns)
                        //s.ColumnMappings.Add(column.ToString(), column.ToString());
                    s.WriteToServer(csvFileData);
                }
            }
        }
    }
}
Stpete111
  • 3,109
  • 4
  • 34
  • 74
  • Just for argument's sake - what happens if your csv has a dummy column as the first column, i.e. in your example, the csv columns would be Dummy, Contoso1, Contoso2 and Contoso3? If that fixes your SQL import issue, you could have a variable that holds the numeric position of the ID column (say position 2, count starts at 0) and then add a dummy column at that position in your csv. – Rakesh Oct 11 '19 at 02:54
  • Hi Rakesh - this is exactly the kind of idea I was looking for. So I tried inserting a dummy column in the csv, and yes, that does work to fix the SQL issue. Now, with that said, when we go to a production environment, I won't have control over the csv file (however, its schema should not ever change). So, would you be able to formulate an answer with a code example of the variable you speak of? If it works, I will of course mark it as the answer. Thanks. – Stpete111 Oct 11 '19 at 13:41
  • Is the location of the ID column known ahead of time and will it always be the same for a "specific" type of file, i.e. lets say you have 3 csvs (and 3 tables that they map to) - lets call them A,B and C. For file A the ID column is position 1, for B its position 4 and for C its position 5 and these positions NEVER change. Does that apply to your case? – Rakesh Oct 11 '19 at 13:44
  • Question 2 - Do you just need to skip at most one column in the csvs. – Rakesh Oct 11 '19 at 13:55

1 Answers1

1

I'm assuming that -

a. only column needs to be skipped, but this can be modified to add multiple columns to skip

b. you know, ahead of time, the zero based index of the column to skip.

With that out of the way, here are the 3 modifications you need to make.

  1. Add the variable to store the index to skip
string csv_file_path = @"pathToCsvFile";
//Assuming just one index for the column number to skip - zero based counting. 
//perhaps read from the AppConfig
int columnIndexToSkip = 0;
DataTable csvData = GetDataTabletFromCSVFile(csv_file_path, columnIndexToSkip);
  1. Modify the function signature to take the extra int parameter
        private static DataTable GetDataTabletFromCSVFile(string csv_file_path, int columnIndexToSkip)
        {
  1. Add the dummy column at that index
                            csvData.Rows.Add(fieldData);
                        }
                    }// end of while (!csvReader.EndOfData) loop

                    if (columnIndexToSkip >= 0)
                    {
                        csvData.Columns.Add("DUMMY").SetOrdinal(columnIndexToSkip);
                    }

I've not tested the import, but the updated csv file looks good to me.

Rakesh
  • 654
  • 4
  • 10
  • 23
  • Rakesh - thank you I will try this now. Both assumptions a. and b. are correct. I will revert soon. – Stpete111 Oct 11 '19 at 17:30
  • Rakesh - if I change your `columnIndexToSkip` variable from 2 to 0, I get the desired results. Perhaps my explanation was confusing, but in any case, using 0 works. If you can update your answer to change it from 2 to 0, I'll mark it as the answer. Thanks! – Stpete111 Oct 11 '19 at 19:26
  • Perfect - 2 was just a value that I set as an example, it will be decided by the csv and table you are looking to update. In the example you provided, the id is index 0 and that is what you need to skip, so the `columnIndexToSkip` needs to be 0. – Rakesh Oct 11 '19 at 20:08