0

Using C#, I need to parse a CSV string that doesn't come from a file. I've found a great deal of material on parsing CSV files, but virtually nothing on strings. It seems as though this should be simple, yet thus far I can come up only with inefficient methods, such as this:

using Microsoft.VisualBasic.FileIO;

var csvParser = new TextFieldParser(new StringReader(strCsvLine));
csvParser.SetDelimiters(new string[] { "," });
csvParser.HasFieldsEnclosedInQuotes = true;

Are there good ways of making this more efficient and less ugly? I will be processing huge volumes of strings, so I wouldn't want to pay the cost of all the above. Thanks.

Paul Lambert
  • 420
  • 3
  • 10
  • 1
    See http://stackoverflow.com/questions/2081418/parsing-csv-files-in-c-sharp – Jason Fry Oct 10 '14 at 01:56
  • 1
    Why do you call the solution you have at hand as inefficient? What efficiency are you expecting from a different solution? – Vikas Gupta Oct 10 '14 at 02:00
  • Thanks -- yes, I saw that SO entry, but it's generally about files, not strings. As for efficiency, I don't think I want to create a new TextFieldParser and a new StringReader for every single string, since this seems hugely wasteful. Still, I'm starting to believe it may not be so bad after all, given the Pandora's Box I've managed to open. – Paul Lambert Oct 10 '14 at 02:04
  • 1
    You have a valid CSV string? Split on `Environment.Newline`, then on commas. What's the problem? – Jonesopolis Oct 10 '14 at 02:38
  • 1
    @Jonesy: I'm gonna guess, from the example the OP has given, that they have commas that are enclosed within quotes that shouldn't be split. Still, it's only slightly more complicated. – Matt Burland Oct 10 '14 at 02:48
  • Yes, there are various complications of that nature (and I realize even TextFieldParser might not handle them all). The general consensus is strongly against rolling your own CSV parser. – Paul Lambert Oct 10 '14 at 02:53

1 Answers1

3

Here is a lightly tested parser that handles quotes

public List<string> Parse(string line)
{
    var columns = new List<string>();
    var sb = new StringBuilder();
    bool isQuoted = false;

    for (int i = 0; i < line.Length; i++)
    {
        char c = line[i];

        // If the current character is a double quote
        if (c == '"')
        {
            // If we're not inside a quoted section, set isQuoted to true
            if (!isQuoted && sb.Length == 0)
            {
                isQuoted = true;
            }
            else if (isQuoted && i + 1 < line.Length && line[i + 1] == '"') // Check for escaped double quotes
            {
                sb.Append('"');
                i++; // Skip the next quote
            }
            else if (isQuoted) // If the next character is not a double quote, set isQuoted to false
            {
                isQuoted = false;
            }
            else // Not a quoted string
            {
                sb.Append('"');
            }
            continue;
        }

        // If the current character is a comma and we're not inside a quoted section, add the column and clear the StringBuilder
        if (!isQuoted && c == ',')
        {
            columns.Add(sb.ToString());
            sb.Clear();
            continue;
        }

        // Append the character to the current column
        sb.Append(c);
    }

    // Add the last column
    columns.Add(sb.ToString());

    return columns;
}
glut
  • 41
  • 5
  • I tested against `var examples = new [] { "x,y", "x,\"y\"", "x,\"y\",z", "x,\"y,w\",z", "x,\"y,\"\"w\",z", };` and it works pretty well. Well done! – Enigmativity Jun 01 '22 at 02:42