-3

I have the following C# code which reads a CSV file and goal is to save it to a SQL table:

StreamReader sr = new StreamReader(tbCSVFileLocation.Text.ToString());
string line = sr.ReadLine();
string[] value = line.Split(',');
DataTable dt = new DataTable();
DataRow row;

foreach (string dc in value)
{
  dt.Columns.Add(new DataColumn(dc));
}

while (!sr.EndOfStream)
{
  value = sr.ReadLine().Split(',');
  if (value.Length == dt.Columns.Count)
  {
    row = dt.NewRow();
    row.ItemArray = value;
    dt.Rows.Add(row);
  }
}

The issue I am having is I don't know where the data is coming from in my table.

Here is a sample of the CSV file:

Name,Address,License Number,License Type,Year of Birth,Effective Date,Action,Misconduct Description,Date Updated "563 Grand Medical, P.C.","563 Grand Street Brooklyn, NY 11211",196275,,,09/29/2010,Revocation of certificate of incorporation.,"The corporation admitted guilt to the charge of ordering excessive tests, treatment, or use of treatment facilities not warranted by the condition of a patient.",09/29/2010 "Aaron, Joseph","2803 North 700 East Provo, Utah 84604",072800,MD,1927,01/13/1999,License Surrender,"This action modifies the penalty previously imposed by Order# 93-40 on March 31, 1993, where the Hearing Committee sustained the charge that the physician was disciplined by the Utah State Medical Board, and ordered that if he intends to engage in practice in NY State, a two-year period of probation shall be imposed.", "Aarons, Mark Gold","P.O.Box 845 Southern Pines, North Carolina 28388",161530,MD,1958,12/13/2005,"License limited until the physician's North Carolina medical license is fully restored without any conditions.The physician must also comply with the terms imposed on July 26, 2005 by the North Carolina State Medical Board. The physician has completed the monitoring terms.",The physician did not contest the charge of having been disciplined by the North Carolina State Medical Board for his addiction to drugs.,12/06/2005

When I look at my SQL table, this is what is shown:

Name    Address License Number  License Type    Year of Birth   Effective Date  Action  Misconduct Description  Date Updated                    
Orlando  FL 32836"  173309  MD  1938    2/29/2012   License surrender   The physician did not contest the charge of having had his DEA registration for Florida revoked by the U.S. Drug Enforcement Administration for improperly prescribing controlled substances.   2/22/2012                   
Miami    Florida 33156" 119545  MD  1945    10/10/2002  Censure and reprimand   The physician did not contest the charge of having been disciplined by the Florida State Board of Medicine for giving a patient excessive doses of radiation.   10/10/2002                  
Brooklyn     New York 11229"    192310          11/6/2003   Annulment of certificate of incorporation pursuant to Section 230-a of the New York State Public Health Law and Section 1503(d) of the New York State Business Corporation Law  The corporation admitted guilt to the charge of willfully failing to comply with Section 1503 of the Business Corporation Law in violation of New York State Education Law Section 6530(12).    10/31/2003                  

As you can see there is no ORLANDO for the first column for the first row. Not sure what is going on.

Please help me resolve it.

Si8
  • 9,141
  • 22
  • 109
  • 221
  • 2
    You don't parse CSV by `Split`ing a line (which is also slow and memory-inefficient). Build a state-machine parser instead. – Dai Jun 06 '14 at 19:11
  • How do I add it to my WinForm application? – Si8 Jun 06 '14 at 19:12
  • You'll need to write one yourself, or use an existing CSV-parsing library. Search online, there are plenty out there. – Dai Jun 06 '14 at 19:13
  • @Dai - Can you please tell me how a state machine parser is better than splitting ? – Erran Morad Jun 06 '14 at 19:14
  • 2
    It seems that you have to split header by comma `,` and values by double quotes `""` and `,`. – Hassan Jun 06 '14 at 19:14
  • `Sikni8`, are you familiar with Vb.Net namespace `Microsoft.VisualBasic.FileIO` TextFieldParser? I can show you some code that I have written lastweek where I parse a .csv file and save the results into a DataTable, from there I use XML to do a bulk insert into Sql Table – MethodMan Jun 06 '14 at 19:15
  • I kinda figured just need to figure out how to do it correctly. Because it's not working at the moment. – Si8 Jun 06 '14 at 19:15
  • @DJKRAZE Any and all help is accepted – Si8 Jun 06 '14 at 19:15
  • since the file you are reading may have some quotes that are off you will need to step thru the code that I am going to show you and add in additional conditional checks where needed but it should be more than enough to get you started.. – MethodMan Jun 06 '14 at 19:16
  • @BoratSagdiyev - The idea of the state machine is that just looking at a given character is not information to decide what to do; you also need to know the "state" at that character. For example, if you are currently inside an escaped field, then you don't treat comma as a delimiter - you treat it as data. It's just a fancy way of saying - "read the data one character at a time, and keep track of what you are currently doing, so that you can decide what to do with the current character". – mbeckish Jun 06 '14 at 19:21
  • 1
    sounds like a slow process one char at a time doesn't sound like something the OP is looking for – MethodMan Jun 06 '14 at 19:24
  • @DJKRAZE - You don't have to pull one character at a time from the file - you can read into a buffer and parse the buffer one character at a time. In the end, each character needs to be looked at to decide what to do, no matter what solution you use. – mbeckish Jun 06 '14 at 19:25

2 Answers2

3

some code that should help get you started.. also use the Debugger to step thru the code

Declare a protected static DataTable csvData and assign it null initially

protected static DataTable csvData = null; // declared up top in your class
csvData = GetDataTabletFromCSVFile(fileName); //Converts the CSV File into a DataTable

private static DataTable GetDataTabletFromCSVFile(string csv_file_path)
{
    csvData = new DataTable(defaultTableName);
    try
    {
        using (TextFieldParser csvReader = new TextFieldParser(csv_file_path))
        {
            csvReader.SetDelimiters(new string[]
            {
                tableDelim 
            });
            csvReader.HasFieldsEnclosedInQuotes = true;
            string[] colFields = csvReader.ReadFields();
            foreach (string column in colFields)
            {
                DataColumn datecolumn = new DataColumn(column);
                datecolumn.AllowDBNull = true;
                csvData.Columns.Add(datecolumn);
            }

            while (!csvReader.EndOfData)
            {
                string[] fieldData = csvReader.ReadFields();
                //Making empty value as null
                for (int i = 0; i < fieldData.Length; i++)
                {
                    if (fieldData[i] == string.Empty)
                    {
                        fieldData[i] = string.Empty; //fieldData[i] = null
                    }
                    //Skip rows that have any csv header information or blank rows in them
                    if (fieldData[0].Contains("Disclaimer") || string.IsNullOrEmpty(fieldData[0]))
                    {
                        continue;
                    }
                }
                csvData.Rows.Add(fieldData);
            }
        }
    }
    catch (Exception ex)
    {
    }
    return csvData;
}

fieldData[0].Contains("Disclaimer") this is the column in my .csv file so read and understand the logic very straight forward and make changes to fit your .csv file as needed

if you want to try something easier and then Parse out the "\" Characters you will get when you use the Quick Watch window try this

var lines = File.ReadLines("FilePath of Some .csv File").Select(a => a.Split(',')).ToArray(); 
MethodMan
  • 18,625
  • 6
  • 34
  • 52
1

Use Sebastien Loren's Fast CSV Reader on CodeProject rather than rolling your own.

For reasons why, see The Comma Separated Value (CSV) File Format: Create or parse data in this popular pseudo-standard format . Calling CSV a "standard format" is a mockery of the word "standard".

Your problem is worse since it appears that you've got quoted fields with embedded end-of-line markers. Now you've got to deal with issue of whether you've got a truncated record or a record that spans multiple lines.

If you have any control over the source of the data, consider switching to use JSON, XML or some other quasi-rationale format for data interchange.

Nicholas Carey
  • 71,308
  • 16
  • 93
  • 135
  • unfortunately it's a downloadeable file and I don't have control over it. I know what you mean. I will look into the link. Thanks. – Si8 Jun 06 '14 at 19:38