2

I have a regular expression to find values between quotes:

([\"'])(?:\\\1|.)*?\1

This works fine, however, if there are double quotes between quote, then it fails and split them too. for example.

"value1","value2","value with "" is here","value4"

I need output like

value1
value2
value with "" is here
value4

that means, if the double quote appears somewhere, it should return that in output. Can anyone help with this?

Cristian Lupascu
  • 39,078
  • 16
  • 100
  • 137
Sameers Javed
  • 342
  • 2
  • 5
  • 16

2 Answers2

2

My first idea was to allow double quotes by adding them to your alternation:

([\"'])(?:\\\1|\1\1|.)*?\1

However, since you've made your quantifier lazy, this will still not quite work. Better make it explicit that unescaped quotes are not permitted between quotes:

([\"'])(?:\\\1|\1\1|(?!\1).)*\1

See it on regex101.

Explanation:

([\"'])   # Match a quote, remember which kind in group 1.
(?:       # Start non-capturing group:
 \\\1     # Either match a backslash-escaped quote
|         # or
 \1\1     # a doubled quote
|         # or
 (?!\1)   # (as long as it's not a quote)
 .        # any character.
)*        # Repeat as necessary
\1        # Match a corresponding quote
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Yeah, it was a part of the CSV file. Yes, your provided expression works '([\"'])(?:\\\1|\1\1|(?!\1).)*\1' however the other expression doesn't. But I am glad and thankful to you that I got a solution. – Sameers Javed Oct 25 '13 at 09:28
1

Your input looks like a CSV record, in which a literal quote is escaped by adding another quote. Are you saying you can also escape a quote with a backslash? I've never seen that; it's usually one or the other. And I've never seen a CSV variant that let you alternate between single-quotes (apostrophes) or double-quotes in the same record. It's possible you're making this more complicated than it needs to be.

Assuming only double-quotes are recognized as field delimiters, and that they can only be escaped by adding another quote, matching a field is as simple as can be:

(?:"[^"]*")+

The backslash-escape version is a little more complicated:

"[^"\\]*(?:\\.[^"\\]*)*"

If you single-quote delimiters are allowed as well, the easiest way is to add another alternative:

(?:"[^"]*")+|(?:'[^']*')+

"[^"\\]*(?:\\.[^"\\]*)*"|'[^'\\]*(?:\\.[^'\\]*)*'

And if you really need to support both kinds of quote and both kinds of escaping, see Tim's answer. But I'm extremely skeptical.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
  • Yeah, it was a part of the CSV file. Yes, your provided expression works (?:"[^"]*")+|(?:'[^']*')+ Both of your and @time-pietzcker provided correct solutions. I am not sure how to mark both answers as "answer". Seems I can mark only one. – Sameers Javed Oct 25 '13 at 09:32