1

Possible Duplicate:
Parsing CSV files in C#

I have a C# application that parses a pipe delimited file. It uses the Regex.Split method:

Regex.Split(line, @"(?<!(?<!\\)*\\)\|")

However recently a data file came across with a pipe included in one of the data fields. The data field in question used quoted identifers so when you open in Excel it opens correctly.

For example I have a file that looks like:

Field1|Field2|"Field 3 has a | inside the quotes"|Field4

When I use the above regex it parses to:

Field1
Field2
Field 3 has a
inside the quotes
Field4

when I would like

Field1
Field2
Field 3 has a | inside the quotes
Field4

I've done a fair amount of research and can't seem to get the Regex.Split to split the file on pipes but respect the quoted identifiers. Any help is greatly appreciated!

Community
  • 1
  • 1
  • 1
    Don't use regex to handle csv files, there exist csv parsers for this, see e.g. [this answer](http://stackoverflow.com/questions/2081418/parsing-csv-files-in-c-sharp) – stema Aug 13 '12 at 07:54

1 Answers1

1

Here is a quick expression I've thrown together than seems to do the trick:

"([^"]+)"|([^\|]+)

Though your expression seems to be doing something with \'s as well, so you might need to add to this expression any other needs you have. I've ignored them in my answer because they were not explained in the question and therefore I cannot provide a solution without knowing why they are there - they may in fact not need to be there at all.

Also, my expression ignores empty fields though (i.e. 1||2|3 would come out as 1, 2 and 3 only) and I don't know whether this is what you need, if it isn't let me know and I can change the expression to something that would cater for that too.

Hope this helps anyway.

Anupheaus
  • 3,383
  • 2
  • 24
  • 30
  • I'm clearly clueless when it comes to Regex. But taking your example and putting into C# it will not compile. Here is code: string[] parts = Regex.Split(line, @"([^"]+)"|([^\|]+); Throws "newline in constant" error. – yak merchant Aug 14 '12 at 11:31
  • Ah, you are missing a few slashes (which you need to use even when using the @ symbol in front of a string that contains quotes) and you appear to be missing the closing quote and bracket. Try this: string[] parts = Regex.Split(line, @"([^\"]+)\"|([^\|]+)"); – Anupheaus Aug 14 '12 at 11:46