1

I'm working in closed envrionment where I cannot install additional packages and have limited ability to use .Net framework classes. Plus I have no control over the CSV file format that I'm receiving.

I receive a CSV file that must be pulled into our business system and updates the database.

I can pull the file in to a DataTable via the below code ...

CSV File Ex:
Order#     Qty       Description                             ...
12345      3         desc1, desc2, desc3, etc..


while (!sr.EndOfStream)
{
    string[] rows = sr.ReadLine().Split(',');

    DataRow dr = dt.NewRow();
    for (int i = 0; i < rows.Length; i++)
    {
        dr[i] = rows[i];
    }
    dt.Rows.Add(dr);
}

However, the problem is that one field in the CSV file is a description that contains multiple "," characters. Doing the above loads each comma separated word set in the description value into its own index in the rows array.

Currently there should be a total of 10 columns in the csv file but with the description field issue the number of columns vary depending on the length/number of commas in the description field...10, 15, 22 columns etc.

I have no control over the format of the CSV file before it's sent. Is there any way to get around this. Even skipping over this field when creating the DataTable would be fine for my purposes.

Thanks

Fstagger
  • 55
  • 1
  • 7
  • Possible duplicate of [Dealing with commas in a CSV file](http://stackoverflow.com/questions/769621/dealing-with-commas-in-a-csv-file) – Pranav Patel Jun 02 '16 at 11:28
  • Use CsvHelper. https://www.nuget.org/packages/CsvHelper/ – Sebastian Siemens Jun 02 '16 at 11:28
  • Don't write CSV parsing code yourself, use a library. – CodeCaster Jun 02 '16 at 11:32
  • @PranavPatel that solution assumes that the commas in the CSV data are escaped. The OP mentions that he has no control over the input. – Wicher Visser Jun 02 '16 at 11:33
  • If the columns have always a TAB, then try to split it with `string[] rows = sr.ReadLine().Split('\t');` – M. Schena Jun 02 '16 at 11:34
  • Is the Description field also the last field? Or, perhaps, are the fields always in the same order? – Michael Gorsich Jun 02 '16 at 11:36
  • the description field is one of the inner columns, but is always in the same place. – Fstagger Jun 02 '16 at 11:51
  • Let me make sure I have something straight. The "rows" array in your code...isn't that actually splitting on column, so that each array element is one column (or field) in a row (sr.ReadLine() return)? – Michael Gorsich Jun 02 '16 at 12:00
  • rows contains an entire row with the split values...(order num, qty, desc) or (12345,3,desc1,desc2,desc3). Essentially yes each index of the rows array contains the column data for that row. The desc1, desc2, desc3 is wrapped in double quotes within the csv file however, the split(',') seems to ignore the field as a string and separates each desc1 - desc2 - desc3 into its own index in the array. That causes a mismatch between the column headers and the actual column data in the DT. – Fstagger Jun 02 '16 at 12:09
  • @Fstagger: Gotcha. That helps a lot. I was under the impression that you were dealing with malformed CSV, but since Description is inside double-quotes that doesn't seem to be the case. – Michael Gorsich Jun 02 '16 at 12:17

5 Answers5

1

You can use textqualifier to enclose every field so that the commas or semicolons are not considered as delimeters. The following method should fix the problem.

Install-Package CsvHelper

 public static DataTable ReadCSVToDataTable(string path)
    {
        CsvHelper.Configuration.CsvConfiguration config = new CsvHelper.Configuration.CsvConfiguration();
        config.Delimiter = delimeter;
        config.Encoding = new UTF8Encoding(false);
        if (string.IsNullOrEmpty(textQualifier))
        {
            config.QuoteAllFields = false;
        }
        else
        {
            char qualifier = textQualifier.ToCharArray()[0];
            config.Quote = qualifier;
            config.QuoteAllFields = true;
        }
        DataTable dt = new DataTable();
        using (var sr = new StreamReader(path))
        {
            using (var reader = new CsvReader(sr, config))
            {
                int j = 0;
                while (reader.Read())
                {
                    if (j == 0)
                    {
                        if (config.HasHeaderRecord)
                        {
                            foreach (string header in reader.FieldHeaders)
                                dt.Columns.Add(header);
                        }
                        else
                        {
                            for (int i = 0; i < reader.CurrentRecord.Length; i++)
                                dt.Columns.Add();
                        }
                        j++;
                    }
                    AddRow(dt, reader);
                }
            }

        }
        return dt;
    }
xyz
  • 762
  • 7
  • 24
0

It looks like your CSV has fixed size columns padded with spaces. So I guess you'd be better off reading a fixed amount of characters for each column and trim the trailing spaces, instead of splitting with comma.

kagelos
  • 423
  • 8
  • 19
0

Try this class. It deals with commas how you need.

Disappointed
  • 1,100
  • 1
  • 9
  • 21
  • Unfortunately, the ERP software we're in won't allow us to install/reference external assemblies. The tab delimited split didn't work it comes in as one line. The CSV has " " around the description field...ie "desc1, desc2" is there some way to identify and escape for " in the split method? – Fstagger Jun 02 '16 at 11:49
  • Does ERP don't allow even to use .net Framework classes ? – Disappointed Jun 02 '16 at 11:50
  • Yes we can use .Net framework classes but it is a pain to get it work correctly. Not worth it from a maintainability standpoint for us. – Fstagger Jun 02 '16 at 11:56
0

Fstagger, this should work for you assuming you have only one column with internal comma's and the CSV is formed properly (especially if the Description field begins with ," and ends with ",. You need to replace my example INDEX_OF_DESCRIPTION with the actual value.

int iDescStart = 0;
int iDescEnd = 0;
string zLine = "";

const int INDEX_OF_DESCRIPTION = 3;
const char SEPARATOR   = '\u001F';  //ASCII Unit Separator, decimal 31

while(!sr.EndOfStream){
   zLine = sr.ReadLine();

   iDescStart = zLine.IndexOf(",\"");
   iDescEnd = zLine.IndexOf("\",");
   zLine = zLine.Substring(0, iDescStart)
         + ","
         + zLine.Substring(iDescStart + 2, iDescEnd - iDescStart - 2).Replace(',', SEPARATOR)
         + ","
         + zLine.Substring(iDescEnd + 2);
   string[] zaFields = zLine.Split(',');
   zaFields[INDEX_OF_DESCRIPTION] = zaFields[INDEX_OF_DESCRIPTION].Replace(SEPARATOR, ',');

   datarow dr = dt.NewRow();
   for (int i = 0; i < zaFields.Length; i++){
      dr[i] = zaFields[i];
   }
   dt.Rows.Add(dr);
}

Let me know if this works for you : )

Michael Gorsich
  • 327
  • 1
  • 10
  • Thanks Michael...I messed around with it more and came up with my own workaround. I didn't try the code but I'm sure it would produce the same result. Thanks again! – Fstagger Jun 02 '16 at 14:02
0

My Solution that ended up working

while (!sr.EndOfStream)
{                      
    string[] rows = sr.ReadLine().Split(',');
    var fullrow = String.Empty;

    foreach (var entry in rows)
    {
        fullrow += entry.ToString() + ",";
    }

    var startQuote = fullrow.IndexOf("\"");
    var endQuote = fullrow.IndexOf("\"", startQuote + 1); //LastIndexOf("\"");

    if (startQuote > -1 && endQuote > -1)
    {    
        var substring = fullrow.Substring(startQuote, Math.Abs(startQuote - endQuote));
        substring = substring.Replace(',', ' ');
        fullrow = fullrow.Remove(startQuote, Math.Abs(startQuote - endQuote)).Insert(startQuote, substring);
    }

    rows = fullrow.Split(',');       

    DataRow dr = dt.NewRow();
    for (int i = 0; i < rows.Length; i++)
    {
        dr[i] = rows[i];
    }
    dt.Rows.Add(dr);
}

Thanks @Michael Gorsich for the alternate code!

Fstagger
  • 55
  • 1
  • 7