0

I have written a desktop app to help some coworkers process some huge .csv files they have. Each "column" within a line (row) is in quotation marks, so it looks something like this:

"something", "blah-blah", "another thing", "etc and so forth"

My simple little program reads a line, uses String.Split(',') function to get an array of values, and off I go to do my processing...UNTIL I hit a row like this:

"something", "blah-blah", "Values, 1, 2, 3", "etc and so forth"

The commas within the quoted value make the Split function behave in an unintended way.

Is there an "easy" (built-in) way I can handle inputting the lines that will correctly parse the example above? I want to avoid having to write my own logic to trudge through each line.

I suspect that using Regular Expressions may be the key to happiness.

Thanks, in advance, for any help you can provide.

rogdawg
  • 687
  • 3
  • 11
  • 33
  • 3
    You'll find yourself repeatedly chasing down problems like this if you try to parse CSV files using String.Split, regex, or other hand-rolled simple solutions. There are several free libraries out there that do CSV file handling very well. You're way better off using one of them. – hatchet - done with SOverflow Jun 28 '13 at 19:40
  • Asked an answered here? http://stackoverflow.com/questions/171480/regex-grabbing-values-between-quotation-marks – Bill Gregg Jun 28 '13 at 19:40
  • Dont reinvent the wheel! http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader – Tim Schmelter Jun 28 '13 at 19:40
  • 1
    "I suspect that using Regular Expressions may be the key to happiness." I suspect that you have just answered your own question :) – Brian Jun 28 '13 at 19:40
  • 1
    @Brian: I suspect not, there are better ways. – Tim Schmelter Jun 28 '13 at 19:42
  • 3
    "I suspect that using Regular Expressions may be the key to happiness." Now you have two problems. – Servy Jun 28 '13 at 19:44
  • 1
    Have look a look at this http://stackoverflow.com/questions/1405038/reading-csv-files-in-net unless you are mad keen on re-inventing wheels – Tony Hopkinson Jun 28 '13 at 19:44
  • 1
    Not an expert on regular expressions, but don't think this can be achieved using them. RE are stateless – Tigran Jun 28 '13 at 19:45
  • @Tigran yeah it's not a good case for RegEx, any language which uses balanced (one closing for every opening) markers of any sort to denote scope (braces, parens, quotes, ect) is not regular. – evanmcdonnal Jun 28 '13 at 20:12
  • If you're parsing CSV files, you should use `TextFieldParser`: http://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.textfieldparser.aspx – Jim Mischel Jun 28 '13 at 21:05
  • Wow! Thanks very much, everyone. This is eye-opening. I think I got the little project to work sufficiently well for the task at hand. I used a simple regular expression. But, going forward, the libraries posted here seem like a much more robust way to go. Thanks, again. – rogdawg Jun 29 '13 at 11:21

2 Answers2

2

There are a lot of edge cases when dealing with quoted strings in CSV and commas/quotes within them. I'd recommend using a library like CsvHelper (or one of the others available in NuGet) that have already figured out the logic and tested it.

Other options:

Chris Doggett
  • 19,959
  • 4
  • 61
  • 86
0

You can trim the first and last quotation off, then it'd look like ->

something", "blah-blah", "Values, 1, 2, 3", "etc and so forth

then you can split on ", " like

 String.Split(@""", """);

or do the split first, then .Replace(@"""", "");

Jonesopolis
  • 25,034
  • 12
  • 68
  • 112