2

How do you ignore the commas inside double quotes and the csv header line (first line)?

 string csvData = File.ReadAllText(csvPath);
                foreach (string row in csvData.Split('\n'))
                {
                    if (!string.IsNullOrEmpty(row))
                    {
                        dt.Rows.Add();
                        int i = 0;
                        foreach (string cell in row.Split(','))
                        {
                            dt.Rows[dt.Rows.Count - 1][i] = cell;

                            i++;
                        }
                    }
                }
user406151
  • 395
  • 3
  • 9
  • 15

3 Answers3

4

The TextFieldParser from Microsoft.VisualBasic.dll could help in this requirement

using (TextFieldParser MyReader = new TextFieldParser(csvPath))
{
     MyReader.TextFieldType = FieldType.Delimited;
     MyReader.SetDelimiters(",");
     MyReader.HasFieldsEnclosedInQuotes = true;
     string[] currentRow;
     currentRow = MyReader.ReadFields();
     while(!MyReader.EndOfData)
     {
        DataRow row = dt.NewRow();
        currentRow = MyReader.ReadFields();
        for(int i = 0; i < currentRow.Length; i++)
        {
            row[i] = currentRow[i];
        }
        dt.Rows.Add(row); 
     }
 }

From my limited experience this class is not very fast but this is what you could use without using an external package that you need to redistribute with your application.

Steve
  • 213,761
  • 22
  • 232
  • 286
0

I'll give you the algorithm in general pseudo-code because this question is not specific to c# but it boils down to knowing wether you are inside an open double quote or not.

  1. Keep a flag saying wether you're inside double quotes or not
  2. Read every row character by character
  3. Simply toggle this flag when you hit a double quote so it reverses its value
  4. When you read a comma and the flag is false, you can dump all characters read thus far as the current cell value, and start accumulating characters read again for the new current cell
Louis
  • 593
  • 4
  • 13
0

use

string[] cols = Regex.Split("\"(,\")?")

instead of a split.

and to ignore the first line. Use a for loop or have a counter in your foreach to skip the first.

The RegEx is from the top of my head so it might need some adjustment but it should work...

but something like this can go compley ... using a csv library might be a consideration ...

silverfighter
  • 6,762
  • 10
  • 46
  • 73
  • That regex doesn't appear to work, but ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)" does..albeit it's very slow to evaluate at scale. – jspinella May 22 '20 at 09:13