1

I have a big string which has got data from a csv file, however when using regular expressions such as:

Regex regex = new Regex(@"\w+|""[\w\s]*""");

it splits every letter instead? there are no spaces foreach line, only at the end of the line - but shouldn't be cutting the line where there is a space inside double quotes.

example: test1,test2,test3,test4,test5,"test 6",test7 (new line)test8,test9,etc.

Can somebody guide me in the right direction? thanks

Mark W
  • 85
  • 2
  • 11
  • Can you use an existing library, such as: http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader? – Eric Aug 06 '12 at 15:18

2 Answers2

4

I recommend referring to an existing solution than reinventing your own (unless you're going for the learning experience!) Parsing CSV is trickier than it seems.

EDIT: Didn't see you were using C#. Here are more links.

Community
  • 1
  • 1
Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
  • 1
    The main trickiness with parsing CSV is figuring out what the exact rules are. There are so many different variants, with slight differences. Actually parsing it once you figured out which rules you need isn't that hard, even when you do it manually. – CodesInChaos Aug 06 '12 at 15:22
1

Use an existing CSV parser instead of trying to use Regex - the format is subtle, as you have seen.

FileHelpers is one popular library for this and there is the TextFieldParser living in the Microsoft.VisualBasic.FileIO namespace.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
  • I can't imagine this being the best way to do this, but right now its my only option. Thanks – Mark W Aug 06 '12 at 15:45
  • @MarkW - Best? In terms of what? Dev effort? Maintainability? Speed? Correct results? – Oded Aug 06 '12 at 15:53