4

I'm looking for a regex that can pull out quoted sections in a string, both single and double quotes.

IE:

"This is 'an example', \"of an input string\""

Matches:

  • an example
  • of an input string

I wrote up this:

 [\"|'][A-Za-z0-9\\W]+[\"|']

It works but does anyone see any flaws with it?

EDIT: The main issue I see is that it can't handle nested quotes.

fletcher
  • 13,380
  • 9
  • 52
  • 69
FlySwat
  • 172,459
  • 74
  • 246
  • 311

5 Answers5

3

How does it handle single quotes inside of double quotes (or vice versa)?

"This is 'an example', \"of 'quotes within quotes'\""

should match

  • an example
  • of 'quotes within quotes'

Use a backreference if you need to support this.

(\"|')[A-Za-z0-9\\W]+?\1

EDIT: Fixed to use a reluctant quantifier.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
  • This does not work for strings like this one: "foo foo \"match\" foo \"match\" foo", where it returns "\"match\" foo \"match\"" as the only match. – Tomalak Oct 16 '08 at 19:58
  • That's because \W is the non-word character class, not the whitespace class, as I thought. My memory's not what it used to be. – Bill the Lizard Oct 16 '08 at 20:15
  • No. :-) Is is because the "+" greedily matches to the end of the string, before backtracking occurs and the last applicable quote is given to back-reference "\1". – Tomalak Oct 16 '08 at 20:20
  • And for that matter, with the "\s" you now have in place it is not going to match punctuation, or accented characters, or greek characters, etc... – Tomalak Oct 16 '08 at 20:24
  • Okay, that's my fault. I misunderstood what was to be matched. I thought it was matching alphanumerics and spaces. So changing to a reluctant quantifier is the ticket here. – Bill the Lizard Oct 16 '08 at 20:40
1

Like that?

"([\"'])(.*?)\1"

Your desired match would be in sub group 2, and the kind of quote in group one.

The flaw in your regex is 1) the greedy "+" and 2) [A-Za-z0-9] is not really matching an awful lot. Many characters are not in that range.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
0

It works but doesn't match other characters in quotes (e.g., non-alphanumeric, like binary or foreign language chars). How about this:

[\"']([^\"']*)[\"']

My C# regex is a little rusty so go easy on me if that's not exactly right :)

Chris Bunch
  • 87,773
  • 37
  • 126
  • 127
  • That doesn't return any matches at all. – FlySwat Oct 16 '08 at 19:23
  • I changed it to use the parens instead of the [], since I think it was thinking the period as a literal period instead of wildcard. I tested it out in Ruby with your example string and it seems to match them fine. – Chris Bunch Oct 16 '08 at 19:26
  • But the greedy start runs over any quotes there are and you will get the longest match, but not the right match. – Tomalak Oct 16 '08 at 19:28
  • In that case, the first match also contains the rest of the string in my test string – FlySwat Oct 16 '08 at 19:29
  • ah, i missed that one. this regex seems to work better: just capture everything that's not a quote – Chris Bunch Oct 16 '08 at 19:33
0
@"(\"|')(.*?)\1"
jfs
  • 399,953
  • 195
  • 994
  • 1,670
0

You might already have one of these, but, in case not, here's a free, open source tool I use all the time to test my regular expressions. I typically have the general idea of what the expression should look like, but need to fiddle around with some of the particulars.

http://renschler.net/RegexBuilder/

joshua.ewer
  • 3,944
  • 3
  • 25
  • 35