1

I have a CSV that looks like this. My goal is to extract each entry (notice I said entry, not line), where an entry starts from the first column and stretches to the last column, and may span multiple lines. I'd like to extract an entry without ruining the formatting. For example, I do not want the following to be considered four seperate lines,

Eg. 1, One Column Multiple Lines

...,"1. copy ctor
2. copy ctor
3. declares function
4. default ctor",... // Where ... represents the columns before and after

but rather a column in one entry that can be represented as such

Eg. 2, One Column Single Line

"1. copy ctor\n2.copy ctor\ndeclares function\n4.default ctor"

enter image description here

When I iterate over the CSV, as such, I get Eg. 1. I'm not sure why splitting on a comma is treating a new line as a comma.

using (var streamReader = new StreamReader("results-survey111101.csv"))
{
   string line;
   while ((line = streamReader.ReadLine()) != null)
   {
       string[] splitLine = line.Split(',');
       foreach (var column in splitLine)
          Console.WriteLine(column);
   }
}

If someone can show me what I need to do to get these multi line CSV columns into one line that maintains the formatting (e.g. adds \t or \n where necessary) that would be great. Thanks!

2 Answers2

0

Assuming your source file is valid CSV, variability in the data is really hard to account for. That's all I'll say, but I'll link you to another SO answer if you need convincing that writing your own CSV parser is a horrible task. Reading CSV files using C#

Let's assume you are going to take advantage of an existing CSV reader library. I'll use TextFieldParser from the Microsoft.VisualBasic library as is used in the example answer I linked.

Your task is to read your source file line by line, and validate whether the line is a complete CSV entry on it's own, or if it forms part of a broken line.

If it forms part of a broken line, we need to remember the line and add the next line to it before attempting validation again.

For this we need to know one thing: What is the expected number of fields each data entry row should have?

int expectedFieldCount = 7;                        
string brokenLine = "";

using (var streamReader = new StreamReader("results-survey111101.csv")) 
{
    string line;
    while ((line = streamReader.ReadLine()) != null) // read the next line
    {
        // if the previous line was incomplete, add it to the current line, 
        // otherwise use the current line
        string csvLineData = (brokenLine.Length > 0) ? brokenLine + line : line;

        try
        {
            using (StringReader stringReader = new StringReader(csvLineData ))                       
            using (TextFieldParser parser = new TextFieldParser(stringReader))
            {                            
                parser.SetDelimiters(",");
                while (!parser.EndOfData)
                {                               
                    string[] fields = parser.ReadFields(); // tests if the line is valid csv       
                    if (expectedFieldCount == fields.Length)
                    {
                        // do whatever you want with the fields now.
                        foreach (var field in fields)
                        {
                            Console.WriteLine(field);
                        }                                    
                        brokenLine = ""; // reset the brokenLine
                    }
                    else // it was valid csv, but we don't have the required number of fields yet
                    {
                        brokenLine += line + @"\r\n";                                    
                        break;
                    }
                }
            }
        }
        catch (Exception ex) // the current line is NOT valid csv, update brokenLine
        {
            brokenLine += (line + @"\r\n");                        
        }
    }
}        

I am replacing the line breaks that broken lines contain with \r\n literals. You can display these in your resulting one-liner field however you want. But you shouldn't expect to be able to copy paste the result into notepad and see line breaks.

SeanOB
  • 752
  • 5
  • 24
0

One assumes you have the same number of columns in each record. Therefore in your code where you do your Split you can merely sum the length of splitLine into a running columnsReadCount until they equal the desired columnsPerRecordCount. At that point you have read all the record and can reset the running columnsReadCount back to zero ready for the next record to read.

cineam mispelt
  • 393
  • 1
  • 8