-2

i have a CSV in the below way. "India,Inc" is a company name which is single value which contains , in it

How to Get the Values in LINQ

12321,32432,423423,Kevin O'Brien,"India,Inc",234235,23523452,235235

  • This may help you: http://stackoverflow.com/questions/5116604/read-csv-using-linq – Holger Leichsenring Apr 29 '15 at 14:45
  • CSV comma delimited, means that you should NOT use commas on the data it will always be separate India as one column and Inc as another column, if you have control of the data you should change it as tab delimited, if you dont then you can create a condition once you split the strings to join those 2 colums, that is assuming that the data will always be similar – Christopher Cabezudo Rodriguez Apr 29 '15 at 14:55
  • @ChristopherCabezudoRodriguez That is incorrect. Please read RFC 4180 which defines the text/csv mimetype http://tools.ietf.org/html/rfc4180 – Robert McKee Apr 29 '15 at 15:44
  • possible duplicate of [C# Regex Split - commas outside quotes](http://stackoverflow.com/questions/3147836/c-sharp-regex-split-commas-outside-quotes) – Parker Apr 29 '15 at 15:48
  • 1
    This is more a question about regular expressions than a LINQ one. – Robert McKee Apr 29 '15 at 15:51
  • @RobertMcKee Good to know, i didn't knew you could encapsulate that on double quotes. that way THANKS! – Christopher Cabezudo Rodriguez Apr 29 '15 at 16:41

6 Answers6

0

Assuming that you will always have the columns that you specify and that the only variable is that company name can have commas inside, this UGLY code can help you achieve your goal.

        var file = File.ReadLines("test.csv");

        var value = from p in file
                     select new string[]
                    { p.Split(',')[0],
                      p.Split(',')[1],
                      p.Split(',')[2],
                      p.Split(',')[3],
                      p.Split(',').Count() == 7 ? p.Split(',')[4] :
                        (p.Split(',').Count() > 7 ? String.Join(",",p.Split(',').Skip(4).Take(p.Split(',').Count() - 7).ToArray() ) : ""),
                      p.Split(',')[p.Split(',').Count() - 3],
                      p.Split(',')[p.Split(',').Count() - 2],
                      p.Split(',')[p.Split(',').Count() - 1]
                     };
0

A regular expression would work, bit nasty due to the recursive nature but it does achieve your goal.

        List<string> matches = new List<string>();

        string subjectString = "12321,32432,423423,Kevin O'Brien,\"India,Inc\",234235,23523452,235235";

        Regex regexObj = new Regex(@"(?<="")\b[123456789a-z,']+\b(?="")|[123456789a-z']+", RegexOptions.IgnoreCase);
        Match matchResults = regexObj.Match(subjectString);
        while (matchResults.Success)
        {
            matches.Add(matchResults.Value);
            // matched text: matchResults.Value
            // match start: matchResults.Index
            // match length: matchResults.Length
            matchResults = matchResults.NextMatch();
        }
DotNetHitMan
  • 931
  • 8
  • 20
0

This should suffice in most cases. It handles quoted strings, strings with double quotes within them, and embedded commas.

var subjectString = "12321,32432,423423,Kevin O'Brien,\"India,Inc\",234235,\"Test End\"\"\",\"\"\"Test Start\",\"Test\"\"Middle\",23523452,235235";
var result=Regex.Split(subjectString,@",(?=(?:[^""]*""[^""]*"")*[^""]*$)")
  .Select(x=>x.StartsWith("\"") && x.EndsWith("\"")?x.Substring(1,x.Length-2):x)
  .Select(x=>x.Replace("\"\"","\""));

It does however break, if you have a field with a single double quote inside it, and the string itself is not enclosed in double quotes -- this is invalid in most definitions of a CSV file, where any field that contains CR, LF, Comma, or Double quote must be enclosed in double quotes.

You should be able to reuse the same Regex expression to break on lines as well for small CSV files. Larger ones you would want a better implementation. Replace the double quotes with LF, and remove the matching ones (unquoted LF's). Then use the regular expression again replacing the quotes with CR, and split on matching.

Robert McKee
  • 21,305
  • 1
  • 43
  • 57
0

Another option is to use CSVHelper and not traying to reinvent the wheel

        var csv = new CsvHelper.CsvReader(new StreamReader("test.csv"));
        while (csv.Read())
        {
            Console.WriteLine(csv.GetField<int>(0));
            Console.WriteLine(csv.GetField<string>(1));
            Console.WriteLine(csv.GetField<string>(2));
            Console.WriteLine(csv.GetField<string>(3));
            Console.WriteLine(csv.GetField<string>(4));

        }

Guide

0

I would recommend LINQ to CSV, because it is powerful enough to handle special characters including commas, quotes, and decimals. They have really worked a lot of these issues out for you.

It only takes a few minutes to set up and it is really worth the time because you won't run into these types of issues down the road like you would with custom code. Here are the basic steps, but definitely follow the instructions in the link above.

  1. Install the Nuget package
  2. Create a class to represent a line item (name the fields the way they're named in the csv)
  3. Use CsvContext.Read() to read into an IEnumerable which you can easily manipulate with LINQ
  4. Use CsvContext.Write() to write a List or IEnumerable to a CSV

This is very easy to setup, has very little code, and is much more scalable than doing it yourself.

-1

becuase you're only reading values delminated bycommas, the spaces shouldn't cause an issue if you just treat them like any other character.

var values = File.ReadLines(path)
    SelectMany(line => line.Split(','));
David
  • 10,458
  • 1
  • 28
  • 40