63

I know this (or similar) has been asked many times but having tried out numerous possibilities I've not been able to find a a regex that works 100%.

I've got a CSV file and I'm trying to split it into an array, but encountering two problems: quoted commas and empty elements.

The CSV looks like:

123,2.99,AMO024,Title,"Description, more info",,123987564

The regex I've tried to use is:

thisLine.split(/,(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/)

The only problem is that in my output array the 5th element comes out as 123987564 and not an empty string.

Interrobang
  • 16,984
  • 3
  • 55
  • 63
Code Ninja
  • 727
  • 1
  • 7
  • 9
  • 2
    It seems like JS. What about using a [proper parser](http://stackoverflow.com/questions/1293147/javascript-code-to-parse-csv-data) ? Also please specify the language you're using, that will prevent a lot of guess work. – HamZa Aug 09 '13 at 10:27
  • 3
    Aside from an exercise in curiosity, what made you think of using regex? – Tony Hopkinson Aug 09 '13 at 10:30
  • Its classic asp (using jscript), and I figured that a regex to process the data before dealing with it would be the simplest way to extract the data. – Code Ninja Aug 09 '13 at 10:45
  • 2
    *"I've not been able to find a a regex that works 100%"* That's mainly because this is a job for a CSV parser, and you should use one. Try this one, for example. https://github.com/gkindel/CSV-JS – Tomalak Aug 09 '13 at 10:49
  • Regular expressions are not well-suited to solve this kind of problems, a simple parser will work much better. And of course using an existing parser is the easier than writing your own, see Tomalak's answer. – Zane Aug 09 '13 at 13:50
  • Check out my answer to a similar question: [How can I parse a CSV string with Javascript?](http://stackoverflow.com/a/8497474/433790) – ridgerunner Aug 09 '13 at 13:56
  • Please add an example element in the array that uses double quotes, because that may always happen with free-form text. Example: `a "b" c` would end up in the CSV as `"a ""b"" c"`. Any good CSV parser needs to be able to handle this. – Thomas Tempelmann Sep 03 '15 at 19:34

18 Answers18

77

Description

Instead of using a split, I think it would be easier to simply execute a match and process all the found matches.

This expression will:

  • divide your sample text on the comma delimits
  • will process empty values
  • will ignore double quoted commas, providing double quotes are not nested
  • trims the delimiting comma from the returned value
  • trims surrounding quotes from the returned value
  • if the string starts with a comma, then the first capture group will return a null value

Regex: (?:^|,)(?=[^"]|(")?)"?((?(1)[^"]*|[^,"]*))"?(?=,|$)

enter image description here

Example

Sample Text

123,2.99,AMO024,Title,"Description, more info",,123987564

ASP example using the non-java expression

Set regEx = New RegExp
regEx.Global = True
regEx.IgnoreCase = True
regEx.MultiLine = True
sourcestring = "your source string"
regEx.Pattern = "(?:^|,)(?=[^""]|("")?)""?((?(1)[^""]*|[^,""]*))""?(?=,|$)"
Set Matches = regEx.Execute(sourcestring)
  For z = 0 to Matches.Count-1
    results = results & "Matches(" & z & ") = " & chr(34) & Server.HTMLEncode(Matches(z)) & chr(34) & chr(13)
    For zz = 0 to Matches(z).SubMatches.Count-1
      results = results & "Matches(" & z & ").SubMatches(" & zz & ") = " & chr(34) & Server.HTMLEncode(Matches(z).SubMatches(zz)) & chr(34) & chr(13)
    next
    results=Left(results,Len(results)-1) & chr(13)
  next
Response.Write "<pre>" & results

Matches using the non-java expression

Group 0 gets the entire substring which includes the comma
Group 1 gets the quote if it's used
Group 2 gets the value not including the comma

[0][0] = 123
[0][1] = 
[0][2] = 123

[1][0] = ,2.99
[1][1] = 
[1][2] = 2.99

[2][0] = ,AMO024
[2][1] = 
[2][2] = AMO024

[3][0] = ,Title
[3][1] = 
[3][2] = Title

[4][0] = ,"Description, more info"
[4][1] = "
[4][2] = Description, more info

[5][0] = ,
[5][1] = 
[5][2] = 

[6][0] = ,123987564
[6][1] = 
[6][2] = 123987564

Edited

As Boris pointed out CSV format will escape a double quote " as a double double quote "". Although this requirement wasn't included by the OP, if your text includes double double quotes then you'll want to use a this modified expression:

Regex: (?:^|,)(?=[^"]|(")?)"?((?(1)(?:[^"]|"")*|[^,"]*))"?(?=,|$)

See also: https://regex101.com/r/y8Ayag/1

It should also be pointed out that Regex is a pattern matching tool not a parsing engine. Therefore if your text includes double double quotes it will still contain the double double quotes after pattern matching is completed. With this solution you'd still need to search for the double double quotes and replace them in your captured text.

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
  • 11
    What software/website are you using to generate those diagrams please ? – Ibrahim Najjar Aug 09 '13 at 13:16
  • Wonderful! It does, however, fail to parse lines like ,,5 correctly - the first empty field is missing. Not sure how to fix this, though. :-( And it is broken on a,",",a – Dr. Hans-Peter Störr Mar 14 '14 at 09:36
  • I'm sure there was an undocumented flaw in Java's implementation of Regex, so I removed the Java portion. Since the OP was looking for something in ASP this still works even on your samples. Would you be able to provide links to a non-work example? – Ro Yo Mi Mar 18 '14 at 21:19
  • I don't think this work with escaped quotes `abc,"hello \\"test ", 123` – caub Aug 20 '16 at 07:08
  • 1
    Correct this will not honor escaped quotes, but that is technically out of scope. – Ro Yo Mi Nov 02 '16 at 12:45
  • 1
    @RoYoMi, your regex is perfect! Can you adapt to POSIX regex? So we can use into a database like PostgreSQL, function [regexp_matches(string, regex)](https://www.postgresql.org/docs/current/static/functions-string.html) produce the array as return, but use [this limited regex syntax](https://www.postgresql.org/docs/current/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP). – Peter Krauss Mar 21 '17 at 11:17
  • @PeterKrauss I'm sure it can be done, I'm not familiar enough with POSIX to give a quality translation though. – Ro Yo Mi Mar 24 '17 at 12:43
  • Skips the first field in `"abc""def",ghi`. – Rei Miyasaka May 02 '17 at 21:50
  • 1
    @ReiMiyasaka, you are only partially correct this won't work with nested quotes, however nested quotes was not part of the original problem. The fact that this won't work with nested quotes was already disclosed in my third bullet point `providing double quotes are not nested`. – Ro Yo Mi May 03 '17 at 03:57
  • 1
    I've adjusted your formular a bit so that it matches everything perfectly for me. (?:^|,)(?=[^"]|(")?)"?((?(1).*?(?=",)|[^,]*))(?=|$) -> the part .*?(?=",) instead of [^"]* matches all characters except an upcoming ", after the first group has positivly matched. If first group matches the string does start with a quote and should also end like it. GL and i hope this helps also someone else. – Erik Jun 26 '18 at 15:09
  • 1
    use this string literal in C#.net `(?:^|,)(?=[^\"]|(\")?)\"?((?(1)[^\"]*|[^\"]*))\"?(?=,|$)` It can be validated at http://regexstorm.net/tester and **be aware** that Visual Studio likes to 'format' it, **breaking** it by entering a bunch of unnecessary spaces. – Sean Munson Jan 15 '19 at 19:47
  • 1
    This doesn't parse the CSV format as specified in [RFC 4180](https://tools.ietf.org/html/rfc4180). For example `"""test"" test",` should be parsed as `"test" test,` because 2 `"` are used to represent one `"` character, but it's not detected. – Boris Verkhovskiy Nov 17 '20 at 19:00
  • 1
    It's worth noting that awk and sed can't handle this level of regex, but perl does, so you could do the following in bash. `cat mydata.csv|perl -ne 's/(?:^|,)(?=[^"]|(")?)"?((?(1)(?:[^"]|"")*|[^,"]*))"?(?=,|$)/\2<\/data>/g; print;'` – user1754036 Nov 30 '22 at 18:56
  • @IbrahimNajjar maybe https://draw.io and another one easier/faster https://mermaid-js.github.io – Zaman Feb 08 '23 at 07:48
45

Worked on this for a bit and came up with this solution:

(?:,|\n|^)("(?:(?:"")*[^"]*)*"|[^",\n]*|(?:\n|$))

Try it out here!

This solution handles "nice" CSV data like

"a","b",c,"d",e,f,,"g"

0: "a"
1: "b"
2: c
3: "d"
4: e
5: f
6:
7: "g"

and uglier things like

"""test"" one",test' two,"""test"" 'three'","""test 'four'"""

0: """test"" one"
1: test' two
2: """test"" 'three'"
3: """test 'four'"""

Here's an explanation of how it works:

(?:,|\n|^)      # all values must start at the beginning of the file,  
                #   the end of the previous line, or at a comma  
(               # single capture group for ease of use; CSV can be either...  
  "             # ...(A) a double quoted string, beginning with a double quote (")  
    (?:         #        character, containing any number (0+) of  
      (?:"")*   #          escaped double quotes (""), or  
      [^"]*     #          non-double quote characters  
    )*          #        in any order and any number of times  
  "             #        and ending with a double quote character  

  |             # ...or (B) a non-quoted value  

  [^",\n]*      # containing any number of characters which are not  
                # double quotes ("), commas (,), or newlines (\n)  

  |             # ...or (C) a single newline or end-of-file character,  
                #           used to capture empty values at the end of  
  (?:\n|$)      #           the file or at the ends of lines  
)
awwsmm
  • 1,353
  • 1
  • 18
  • 28
  • 2
    Great solution! Also works properly with .NET's Regex class. – Herman Cordes Mar 22 '18 at 10:29
  • 1
    @HermanCordes, glad it helps! I just removed some of the unnecessary non-capturing groups, so it should be a bit faster now. (Gist was updated but this post wasn't... oops!) – awwsmm Mar 22 '18 at 12:35
  • 2
    Appreciate this solution. PSA if you have an empty first column (ex. `,foo,bar`) it will not be captured. One workaround is to prepend empty quotes `""` to such a row before parsing. – adu May 22 '19 at 16:36
  • 2
    Perfect. All other answers produced incorrect results even with simple examples, but this one worked for all of my cases (inconsistent enclosures + commas inside values). – afilina Nov 02 '19 at 04:48
  • 1
    Still a very valuable answer, thank you! This works great with newlines in the values and (most importantly) quotation marks escaped by double-quotes (""). – BurninLeo Sep 23 '20 at 06:29
  • This doesn't parse the CSV format as specified in [RFC 4180](https://tools.ietf.org/html/rfc4180). The second "ugly" line you posted should be parsed as `['"test" one', "test' two", '"test" \'three\'', '"test \'four\'"']`, because 2 `"` are used to represent one `"` character. – Boris Verkhovskiy Nov 17 '20 at 18:55
  • 1
    Ok, this was the only one which worked for me. But how do I change it so the Quotes are removed? – Gener4tor Jul 19 '21 at 12:12
  • There's a recursive repetition in the middle of this regex which can lead to crashes. – Guillaume F. Nov 23 '21 at 17:34
  • In this solution if the first element in the comma delimited string is empty then it's not returned as a matched null value. This might be important if you need the columns to line up. – Ro Yo Mi Mar 09 '22 at 03:37
15

I'm late to the party, but the following is the Regular Expression I use:

(?:,"|^")(""|[\w\W]*?)(?=",|"$)|(?:,(?!")|^(?!"))([^,]*?)(?=$|,)|(\r\n|\n)

This pattern has three capturing groups:

  1. Contents of a quoted cell
  2. Contents of an unquoted cell
  3. A new line

This pattern handles all of the following:

  • Normal cell contents without any special features: one,2,three
  • Cell containing a double quote (" is escaped to ""): no quote,"a ""quoted"" thing",end
  • Cell contains a newline character: one,two\nthree,four
  • Normal cell contents which have an internal quote: one,two"three,four
  • Cell contains quotation mark followed by comma: one,"two ""three"", four",five

See this pattern in use.

If you have are using a more capable flavor of regex with named groups and lookbehinds, I prefer the following:

(?<quoted>(?<=,"|^")(?:""|[\w\W]*?)*(?=",|"$))|(?<normal>(?<=,(?!")|^(?!"))[^,]*?(?=(?<!")$|(?<!"),))|(?<eol>\r\n|\n)

See this pattern in use.

Edit

(?:^"|,")(""|[\w\W]*?)(?=",|"$)|(?:^(?!")|,(?!"))([^,]*?)(?=$|,)|(\r\n|\n)

This slightly modified pattern handles lines where the first column is empty as long as you are not using Javascript. For some reason Javascript will omit the second column with this pattern. I was unable to correctly handle this edge-case.

bublebboy
  • 315
  • 2
  • 6
  • Of all the posted solutions, this one works the best for me. It stands up to various edge-cases. However, it doesn't handle a string that starts with a comma. For example, ",second,third" should produce 3 matches, but only produces 2. – Suraj Apr 17 '18 at 00:01
  • @bubleboy - I love your regex, but it does not seem to handle cases where the last column of a csv is empty `..., column5,` Is there a way to tweak that regex to catch those trailing columns? – RHarris Nov 01 '18 at 12:13
  • @RHarris - Thank you. I tested the pattern and it was getting the empty last column. What language are you using? Not all regex implementations are the same. The language may have different requirements. – bublebboy Nov 02 '18 at 19:12
  • @RHarris - I tried this in C# and it did work as expected: `var pat = new System.Text.RegularExpressions.Regex(@"(?:^""|,"")(""""|[\w\W]*?)(?="",|""$)|(?:^(?!"")|,(?!""))([^,]*?)(?=$|,)|(\r\n|\n)", System.Text.RegularExpressions.RegexOptions.Multiline);` `var all = pat.Matches(",one,two,\"lets test, some \"\"quotes\"\"\",three,");` This resulted in `MatchCollection(5) { [], [,two], [,"lets test, some ""quotes""], [,three], [,] }` – bublebboy Nov 06 '18 at 14:16
  • Sorry, realized that it was my TextReader.ReadLine() that was causing the issue. Second to last column had `\r\n`. (e.g. `...",three\r\n and some further text,". I was never actually getting that last column. Thanks for your help. – RHarris Nov 06 '18 at 15:53
  • Your regexp fails on this simple test `",","value2"` – Nickolay Savchenko Dec 14 '21 at 10:08
13

I created this a few months ago for a project.

 ".+?"|[^"]+?(?=,)|(?<=,)[^"]+

Regular expression visualization

It works in C# and the Debuggex was happy when I selected Python and PCRE. Javascript doesn't recognize this form of Proceeded By ?<=....

For your values, it will create matches on

123
,2.99
,AMO024
,Title
"Description, more info"
,
,123987564

Note that anything in quotes doesn't have a leading comma, but attempting to match with a leading comma was required for the empty value use case. Once done, trim values as necessary.

I use RegexHero.Net to test my Regex.

scott.smart
  • 538
  • 5
  • 16
  • 2
    Though the example in the question doesn't mention it, a perfect regex algorithm needs to also deal with quote chars inside a field, like in: `single,"quoted","with ""quotes""",end`. Yours doesn't, yet. – Thomas Tempelmann Sep 03 '15 at 19:32
  • I also wonder about the first "?" past the "+" - that looks superfluous to me. And I had to change it to `"[^"]+"|[^"]+?(?=,)|(?<=,)[^"]+` or it wouldn't correctly scan quoted fields with my version of regex (in Real Studio 2012). – Thomas Tempelmann Sep 03 '15 at 19:39
  • 2
    The "?" after the "+" assigns non-greedy status, it will grab as much as it can up to the first instance of the next character. If the original post had two quoted values, for instance, not using the question mark could grab all text between the first and last instance of the quotes. Double quotes is difficult, I'll see if i can figure out a solution. – scott.smart Sep 03 '15 at 19:54
  • If the double quotes are inside, one can scan over them easily with the formula `(?:")`, but that doesn't work at the end of a quoted field, unfortunately. Looking at the linked Q I saw that escaping quotes by doubling them is also not universal - Apple's Numbers does that but other apps may instead escape with `\"`. Also: Empty fields at the very start or end didn't work. For the start I'm using a special case in my code, for the end I added `|,$` to the regex. – Thomas Tempelmann Sep 03 '15 at 20:13
  • ,?".+?"|[^"]+?(?=,)|[^"]+ Just a small modification, this works pretty good now! – zerocool Oct 14 '16 at 20:41
10

I needed this answer too, but I found the answers, while informative, a little hard to follow and replicate for other languages. Here is the simplest expression I came up with for a single column out of the CSV line. I am not splitting. I'm building a regex to match a column out of the CSV so I'm not splitting the line:

("([^"]*)"|[^,]*)(,|$)

This matches a single column from the CSV line. The first portion "([^"]*)" of the expression is to match a quoted entry, the second part [^,]* is to match a non-quoted entry. Then either followed by a , or end of line $.

And the accompanying debuggex to test out the expression.

https://www.debuggex.com/r/s4z_Qi2gZiyzpAhx

chubbsondubs
  • 37,646
  • 24
  • 106
  • 138
  • 3
    And it works in JavaScript (which is not what the OP asked, but it is helpful to know). – Michael Plautz Aug 23 '16 at 13:53
  • You should make sure the line does not end with a \r or \n (or \r\n) if you're using this. – Mr.WorshipMe Apr 04 '18 at 07:53
  • 2
    Doesn't handle escaped double-quotes ("") – Tamir Daniely Dec 24 '18 at 18:16
  • It does handle both double ("") and escaped (\") quotes in the way that this regex returns a full field, either quoted or not quoted, separated by comma. The handling of double and escaped quotes is still needed after fetching a field, but the main task, getting the complete field, is properly achieved with this concise regex. If you use this regex in programming code, you can easily perform a replace operation for the double or escaped afterwards (but only if the found string starts with a quote sign. – Thomas Tempelmann Feb 17 '21 at 12:02
5

I personally tried many RegEx expressions without having found the perfect one that match all cases.

I think that regular expressions is hard to configure properly to match all cases properly. Although few persons will not like the namespace (and I was part of them), I propose something that is part of the .Net framework and give me proper results all the times in all cases (mainly managing every double quotes cases very well):

Microsoft.VisualBasic.FileIO.TextFieldParser

Found it here: StackOverflow

Example of usage:

TextReader textReader = new StringReader(simBaseCaseScenario.GetSimStudy().Study.FilesToDeleteWhenComplete);
Microsoft.VisualBasic.FileIO.TextFieldParser textFieldParser = new TextFieldParser(textReader);
textFieldParser.SetDelimiters(new string[] { ";" });
string[] fields = textFieldParser.ReadFields();
foreach (string path in fields)
{
    ...

Hope it could help.

Community
  • 1
  • 1
Eric Ouellet
  • 10,996
  • 11
  • 84
  • 119
4

In Java this pattern ",(?=([^\"]*\"[^\"]*\")*(?![^\"]*\"))" almost work for me:

String text = "\",\",\",,\",,\",asdasd a,sd s,ds ds,dasda,sds,ds,\"";
String regex = ",(?=([^\"]*\"[^\"]*\")*(?![^\"]*\"))";
Pattern p = Pattern.compile(regex);
String[] split = p.split(text);
for(String s:split) {
    System.out.println(s);
}

output:

","
",a,,"

",asdasd a,sd s,ds ds,dasda,sds,ds,"

Disadvantage: not work, when column have an odd number of quotes :(

marioosh
  • 27,328
  • 49
  • 143
  • 192
3

The advantage of using JScript for classic ASP pages is that you can use one of the many, many libraries that have been written for JavaScript.

Like this one: https://github.com/gkindel/CSV-JS. Download it, include it in your ASP page, parse CSV with it.

<%@ language="javascript" %>

<script language="javascript" runat="server" src="scripts/csv.js"></script>
<script language="javascript" runat="server">

var text = '123,2.99,AMO024,Title,"Description, more info",,123987564',
    rows = CSV.parse(line);

    Response.Write(rows[0][4]);
</script>
Tomalak
  • 332,285
  • 67
  • 532
  • 628
3

Aaaand another answer here. :) Since I couldn't make the others quite work.

My solution both handles escaped quotes (double occurrences), and it does not include delimiters in the match.

Note that I have been matching against ' instead of " as that was my scenario, but simply replace them in the pattern for the same effect.

Here goes (remember to use the "ignore whitespace" flag /x if you use the commented version below) :

# Only include if previous char was start of string or delimiter
(?<=^|,)
(?:
  # 1st option: empty quoted string (,'',)
  '{2}
  |
  # 2nd option: nothing (,,)
  (?:)
  |
  # 3rd option: all but quoted strings (,123,)
  # (included linebreaks to allow multiline matching)
  [^,'\r\n]+
  |
  # 4th option: quoted strings (,'123''321',)
  # start pling
  ' 
    (?:
      # double quote
      '{2}
      |
      # or anything but quotes
      [^']+
    # at least one occurance - greedy
    )+
  # end pling
  '
)
# Only include if next char is delimiter or end of string
(?=,|$)

Single line version:

(?<=^|,)(?:'{2}|(?:)|[^,'\r\n]+|'(?:'{2}|[^']+)+')(?=,|$)

Regular expression visualization (if it works, debux has issues right now it seems - else follow the next link)

Debuggex Demo

regex101 example

Johny Skovdal
  • 2,038
  • 1
  • 20
  • 36
2

If you know that you won't have an empty field (,,) then this expression works well:

("[^"]*"|[^,]+)

As in the following example...

Set rx = new RegExp
rx.Pattern = "(""[^""]*""|[^,]+)"
rx.Global = True
Set col = rx.Execute(sText)
For n = 0 to col.Count - 1
    if n > 0 Then s = s & vbCrLf
    s = s & col(n)
Next

However, if you anticipate an empty field and your text is relatively small than you might consider replacing the empty fields with a space prior to parsing to ensure that they are captured. For example...

...
Set col = rx.Execute(Replace(sText, ",,", ", ,"))
...

And if you need to maintain the integrity of the fields, you can restore the commas and test for empty spaces inside the loop. This may not be the most efficient method but it gets the job done.

k rey
  • 611
  • 4
  • 11
  • This is the one that resolved my issue. I don't support empty fields so that allows me to grab everything whereas if I had quotes before in any field it caused complications. – Imbaker1234 Apr 09 '20 at 04:42
2

Yet another answer with a few extra features like support for quoted values that contain escaped quotes and CR/LF characters (single values that span multiple lines).

NOTE: Though the solution below can likely be adapted for other regex engines, using it as-is will require that your regex engine treats multiple named capture groups using the same name as one single capture group. (.NET does this by default)


When multiple lines/records of a CSV file/stream (matching RFC standard 4180) are passed to the regular expression below it will return a match for each non-empty line/record. Each match will contain a capture group named Value that contains the captured values in that line/record (and potentially an OpenValue capture group if there was an open quote at the end of the line/record).

Here's the commented pattern (test it on Regexstorm.net):

(?<=\r|\n|^)(?!\r|\n|$)                       // Records start at the beginning of line (line must not be empty)
(?:                                           // Group for each value and a following comma or end of line (EOL) - required for quantifier (+?)
  (?:                                         // Group for matching one of the value formats before a comma or EOL
    "(?<Value>(?:[^"]|"")*)"|                 // Quoted value -or-
    (?<Value>(?!")[^,\r\n]+)|                 // Unquoted value -or-
    "(?<OpenValue>(?:[^"]|"")*)(?=\r|\n|$)|   // Open ended quoted value -or-
    (?<Value>)                                // Empty value before comma (before EOL is excluded by "+?" quantifier later)
  )
  (?:,|(?=\r|\n|$))                           // The value format matched must be followed by a comma or EOL
)+?                                           // Quantifier to match one or more values (non-greedy/as few as possible to prevent infinite empty values)
(?:(?<=,)(?<Value>))?                         // If the group of values above ended in a comma then add an empty value to the group of matched values
(?:\r\n|\r|\n|$)                              // Records end at EOL

Here's the raw pattern without all the comments or whitespace.
(?<=\r|\n|^)(?!\r|\n|$)(?:(?:"(?<Value>(?:[^"]|"")*)"|(?<Value>(?!")[^,\r\n]+)|"(?<OpenValue>(?:[^"]|"")*)(?=\r|\n|$)|(?<Value>))(?:,|(?=\r|\n|$)))+?(?:(?<=,)(?<Value>))?(?:\r\n|\r|\n|$)

[Here is a visualization from Debuggex.com][3] (capture groups named for clarity): ![Debuggex.com visualization][4]

Examples on how to use the regex pattern can be found on my answer to a similar question here, or on C# pad here, or here.

Community
  • 1
  • 1
David Woodward
  • 1,265
  • 11
  • 20
  • ReGexp does detect some values if you have string like this " "a",",b",c,"d,D",e,f,,"g"" – Mindaugas Jaraminas Jul 08 '19 at 13:40
  • You are correct. The regex does detect values. Are you saying it shouldn't? If I use the [regexstorm.net](http://regexstorm.net/tester) link in my answer to test your string without the outer quotes ("a",",b" ... "g") I get 8 matches in the "Table" tab as I'd expect which are: [a] [,b] [c] [d,D] [e] [f] [] [g] If the outer quotes are included then it is invalid csv because the quotes are not properly escaped. – David Woodward Jul 09 '19 at 21:58
2

I use this expression. It takes into account a space after a comma, which I have come across.

(?:,"|^"|, ")(""|[\w\W]*?)(?=",|"$)|(?:,(?!")|^(?!"))([^,]*?)(?=$|,)|(\r\n|\n)
Keith Aymar
  • 876
  • 7
  • 10
1

I'm using this one, it works with coma separator and double quote escaping. Normally that's should solved your problem :

/(?<=^|,)(\"(?:[^"]+|"")*\"|[^,]*)(?:$|,)/g
0

I had a similar need for splitting CSV values from SQL insert statements.

In my case, I could assume that strings were wrapped in single quotations and numbers were not.

csv.split(/,((?=')|(?=\d))/g).filter(function(x) { return x !== '';});

For some probably obvious reason, this regex produces some blank results. I could ignore those, since any empty values in my data were represented as ...,'',... and not ...,,....

peter
  • 6,067
  • 2
  • 33
  • 44
0

If i try the regex posted by @chubbsondubs on http://regex101.com using the 'g' flag, there are matches, that contain only ',' or an empty string. With this regex:
(?:"([^"]*)"|([^,]*))(?:[,])
i can match the parts of the CSV (inlcuding quoted parts). (The line must be terminated with a ',' otherwise the last part isn't recognized.)
https://regex101.com/r/dF9kQ8/4
If the CSV looks like:
"",huhu,"hel lo",world,
there are 4 matches:
''
'huhu'
'hel lo'
'world'

deemon
  • 133
  • 1
  • 6
0
,?\s*'.+?'|,?\s*".+?"|[^"']+?(?=,)|[^"']+  

This regex works with single and double quotes and also for one quote inside another!

zerocool
  • 316
  • 1
  • 4
  • 19
0

This one matches all i need in c#:

(?<=(^|,)(?<quote>"?))([^"]|(""))*?(?=\<quote>(?=,|$))
  • strips quotes
  • lets new lines
  • lets double quotes in the quoted string
  • lets commas in the quoted string
-12

The correct regular expression to match a single quoted value with escaped [doubled] single quotes in it is:

'([^n']|(''))+'
Brad Larson
  • 170,088
  • 45
  • 397
  • 571