18

Suppose I have this CSV file :

NAME,ADDRESS,DATE
"Eko S. Wibowo", "Tamanan, Banguntapan, Bantul, DIY", "6/27/1979"

I would like like to store each token that enclosed using a double quotes to be in an array, is there a safe to do this instead of using the String split() function? Currently I load up the file in a RichTextBox, and then using its Lines[] property, I do a loop for each Lines[] element and doing this :

string[] line = s.Split(',');

s is a reference to RichTextBox.Lines[]. And as you can clearly see, the comma inside a token can easily messed up split() function. So, instead of ended with three token as I want it, I ended with 6 tokens

Any help will be appreciated!

swdev
  • 4,997
  • 8
  • 64
  • 106
  • 1
    http://stackoverflow.com/questions/2081418/parsing-csv-files-in-c-sharp – chancea Jun 20 '13 at 07:07
  • 1
    Unless you want to display anything, do not (ab)use GUI components for data storage. If you need the contents of the file line by line, use the [`File.ReadLines` method](http://msdn.microsoft.com/en-us/library/dd383503.aspx). – O. R. Mapper Jun 20 '13 at 07:07
  • http://stackoverflow.com/questions/769621/dealing-with-commas-in-a-csv-file – Satpal Jun 20 '13 at 07:11
  • @O.R.Mapper You're absolutely right! I'll change my code design for that – swdev Jun 21 '13 at 13:03
  • @chancea CsvHelper and CsvReader it that link should be good, but I think I will go with the solution that use RegEx. :) Thanks! – swdev Jun 21 '13 at 13:13

6 Answers6

26

You could use regex too:

string input = "\"Eko S. Wibowo\", \"Tamanan, Banguntapan, Bantul, DIY\", \"6/27/1979\"";
string pattern = @"""\s*,\s*""";

// input.Substring(1, input.Length - 2) removes the first and last " from the string
string[] tokens = System.Text.RegularExpressions.Regex.Split(
    input.Substring(1, input.Length - 2), pattern);

This will give you:

Eko S. Wibowo
Tamanan, Banguntapan, Bantul, DIY
6/27/1979
unlimit
  • 3,672
  • 2
  • 26
  • 34
  • I accepted this as the answer, as I always want to enhanced my skill on RegEx and actually, this solution should be part of a PHP solution, which depend greatly on RegEx also for this purpose. Using a .NET only solution would not be a good idea. Although, I am sorry that I am not elaborate enough about it. I just got this idea when I read answer by @unlimit : a simple RegEx is way to go! – swdev Jun 21 '13 at 13:17
  • 4
    this is a fine solution but just a caution not every CSV file will **always** put quotes around each value. I know if you make a CSV file from Excel it does not, only when the values have commas, quotes, etc inside the value. – chancea Jun 21 '13 at 14:17
  • 15
    A better pattern would be `""?\s*,\s*""?`, so that it matches columns which don't have double quotes too. Sometimes CSV files have numerical values without the double quotes. – Adam K Dean Oct 20 '14 at 11:21
9

I've done this with my own method. It simply counts the amout of " and ' characters.
Improve this to your needs.

    public List<string> SplitCsvLine(string s) {
        int i;
        int a = 0;
        int count = 0;
        List<string> str = new List<string>();
        for (i = 0; i < s.Length; i++) {
            switch (s[i]) {
                case ',':
                    if ((count & 1) == 0) {
                        str.Add(s.Substring(a, i - a));
                        a = i + 1;
                    }
                    break;
                case '"':
                case '\'': count++; break;
            }
        }
        str.Add(s.Substring(a));
        return str;
    }
joe
  • 8,344
  • 9
  • 54
  • 80
  • By including both `"` and `'` in the counter you'll incorrectly parse something with mixed quotes: `" \"The quote's break\", this "` – drzaus May 26 '16 at 16:14
  • @drzaus: That's correct. The actual method is more complicated and has a lot counters for different things. The shown code is meant to show the basic idea. – joe May 30 '16 at 09:51
2

It's not an exact answer to your question, but why don't you use already written library to manipulate CSV file, good example would be LinqToCsv. CSV could be delimited with various punctuation signs. Moreover, there are gotchas, which are already addressed by library creators. Such as dealing with name row, dealing with different date formats and mapping rows to C# objects.

Yaroslav Yakovlev
  • 6,303
  • 6
  • 39
  • 59
0lukasz0
  • 3,155
  • 1
  • 24
  • 40
2

You can replace "," with ; then split by ;

var values= s.Replace("\",\"",";").Split(';');
Abdul Hadi
  • 1,229
  • 1
  • 11
  • 20
0

If your CSV line is tightly packed it's easiest to use the end and tail removal mentioned earlier and then a simple split on a joining string

 string[] tokens = input.Substring(1, input.Length - 2).Split("\",\"");

This will only work if ALL fields are double-quoted even if they don't (officially) need to be. It will be faster than RegEx but with given conditions as to its use.

Really useful if your data looks like "Name","1","12/03/2018","Add1,Add2,Add3","other stuff"

TaSwavo
  • 11
  • 1
0

Five years old but there is always somebody new who wants to split a CSV.

If your data is simple and predictable (i.e. never has any special characters like commas, quotes and newlines) then you can do it with split() or regex.

But to support all the nuances of the CSV format properly without code soup you should really use a library where all the magic has already been figured out. Don't re-invent the wheel (unless you are doing it for fun of course).

CsvHelper is simple enough to use:

https://joshclose.github.io/CsvHelper/2.x/

using (var parser = new CsvParser(textReader)
{
    while(true)
    {
        string[] line = parser.Read();

        if (line != null)
        {
            // do something
        }
        else
        {
            break;
        }
    }
}

More discussion / same question: Dealing with commas in a CSV file

Etherman
  • 1,777
  • 1
  • 21
  • 34