0

This is the sample

"abc","abcsds","adbc,ds","abc"

Output should be

abc
abcsds
adbc,ds
abc
Zombo
  • 1
  • 62
  • 391
  • 407
  • What language are you using? Using split() isn't going to be exactly what you're looking for. – genio Oct 21 '09 at 20:26

4 Answers4

0

Try this:

"(.*?)"

if you need to put this regex inside a literal, don't forget to escape it:

Regex re = new Regex("\"(.*?)\"");
Rubens Farias
  • 57,174
  • 8
  • 131
  • 162
  • If use this // line read from .csv file string line ="\"abc\",\"abcsds\",\"adbc,ds\",\"abc\""; string [] abc = Regex.Split(line,"\"(.*?)\""); Output abc[0] =""
    abc[1] ="abc"
    abc[2] =","
    abc[3] ="abcsds"
    abc[4] =","
    abc[5] ="adbc,ds"
    abc[6] =","
    abc[7] ="abc"
    abc[8] =""
    –  Oct 21 '09 at 19:53
  • Sorry, I didn't understood; can you please edit your question? – Rubens Farias Oct 21 '09 at 20:12
  • btw, what language are you using? – Rubens Farias Oct 21 '09 at 20:13
  • C# string line="\"abc\",\"abcsds\",\"adbc,ds\",\"abc\""; –  Oct 21 '09 at 23:06
  • string [] abc = Regex.Split(line,"\"(.*?)\""); –  Oct 21 '09 at 23:08
  • Output abc[0] ="", abc[1] ="abc" , abc[2] ="," , abc[3] ="abcsds" , abc[4] ="," , abc[5] ="adbc,ds" , abc[6] ="," , abc[7] ="abc" , abc[8] ="" – –  Oct 21 '09 at 23:09
  • Thansk if I use this "\",\"" it spilts all the values within "" and i will replace "" with blank –  Oct 21 '09 at 23:20
0

This is a tougher job than you realize -- not only can there be commas inside the quotes, but there can also be quotes inside the quotes. Two consecutive quotes inside of a quoted string does not signal the end of the string. Instead, it signals a quote embedded in the string, so for example:

"x", "y,""z"""

should be parsed as:

x
y,"z"

So, the basic sequence is something like this:

Find the first non-white-space character.
If it was a quote, read up to the next quote. Then read the next character.
    Repeat until that next character is not also a quote.
    If the next (non-whitespace) character is not a comma, input is malformed.
If it was not a quote, read up to the next comma.
Skip the comma, repeat the whole process for the next field.

Note that despite the tag, I'm not providing a regex -- I'm not at all sure I've seen a regex that can really handle this properly.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
0

If you can be sure there are no inner, escaped quotes, then I guess it's ok to use a regular expression for this. However, most modern languages already have proper CSV parsers.

Use a proper parser is the correct answer to this. Text::CSV for Perl, for example.

However, if you're dead set on using regular expressions, I'd suggest you "borrow" from some sort of module, like this one: http://metacpan.org/pod/Regexp::Common::balanced

szabgab
  • 6,202
  • 11
  • 50
  • 64
genio
  • 874
  • 6
  • 7
0

This answer has a C# solution for dealing with CSV.

In particular, the line

private static Regex rexCsvSplitter = new Regex( @",(?=(?:[^""]*""[^""]*"")*(?![^""]*""))" );

contains the Regex used to split properly, i.e., taking quoting and escaping into consideration.

Basically what it says is, match any comma that is followed by an even number of quote marks (including zero). This effectively prevents matching a comma that is part of a quoted string, since the quote character is escaped by doubling it.

Keep in mind that the quotes in the above line are doubled for the sake of the string literal. It might be easier to think of the expression as

,(?=(?:[^"]*"[^"]*")*(?![^"]*"))
Community
  • 1
  • 1
harpo
  • 41,820
  • 13
  • 96
  • 131