2

Can't seem to figure out an expression which handles this line of text:

'SOME_TEXT','EVEN_MORE_TEXT','EXPRESSION IS IN (''YES'',''NO'')'

To this groupings

SOME_TEXT
EVEN_MORE_TEXT
EXPRESSION IS IN ('YES', 'NO')

....I'd rather have a nifty regex than solving this by string functions like indexOf(), etc..

Kman
  • 4,809
  • 7
  • 38
  • 62
  • can you put backticks around your groupings to make it easier for us to tell them apart? – lunixbochs Dec 08 '11 at 20:03
  • Sorry! I've edited the text to be more readable. Thanks :) – Kman Dec 08 '11 at 20:09
  • This is somewhat similar to [that famous question](http://stackoverflow.com/q/1732348/192510). You need to overcome the problem of nested delimiters (comma and quote). Sometimes they act as delimiters, sometimes they are just part of the text. Separating the two is beyond normal regular expression capability - although some Regex engines have additional capabilities, it isn't always easy to understand what you are getting in a general sense so you don't always know if the Regex will give the correct answer *all the time*. If some of the time is good enough then go for it. – NealB Dec 08 '11 at 20:27
  • @NealB, Kman didn't mention anything about nesting. The language looks pretty regular to me: a string literal may contain any char except quotes, or if a quote is needed, it must be escaped by another quote. You simply "scan" through the input looking for the pattern `'([^']|'')++'` which will automatically skip over the comma's. – Bart Kiers Dec 08 '11 at 20:32
  • @bart Your right... My mistake. – NealB Dec 08 '11 at 20:56

1 Answers1

4

The regex '([^']|'')++' will match the parts you're interested in, as this demo shows:

$text = "'SOME_TEXT','EVEN_MORE_TEXT','EXPRESSION IS IN (''YES'',''NO'')'";
preg_match_all("/'([^']|'')+'/", $text, $matches);
print_r($matches[0]);

which prints:

Array
(
    [0] => 'SOME_TEXT'
    [1] => 'EVEN_MORE_TEXT'
    [2] => 'EXPRESSION IS IN (''YES'',''NO'')'
)
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
  • +1. I guess the `++` is to avoid catastrophic backtracking yes? – FailedDev Dec 08 '11 at 20:21
  • I get an undefined result when testing the expression. Due to the ++ – Kman Dec 08 '11 at 20:23
  • 1
    Forgive my ignorance, but what's the advantage of using the possessive `++` if it's applied to the entire regex? – ean5533 Dec 08 '11 at 20:25
  • @Kman, ah, I have sticky fingers: I didn't want to include the possessive `++` because some regex flavors do not support them (and Kman didn't mention which s/he was using), _and_ the benefit of possessive matching is minimal, in this case. – Bart Kiers Dec 08 '11 at 20:29
  • @FailedDev, see my comment above. No, in this case, the `++` has little effect since `[^']` and `''` will never consume anything that they might possible give up on at a later stage. – Bart Kiers Dec 08 '11 at 20:30
  • Works like a charm!! Thank you! It all seems so simple when you get it laid out. I'm having a though time wiring my brain to think in regular expression – Kman Dec 08 '11 at 20:45
  • Actually, testing it again, .. I get 5 groups, not 3 (see photo attached to main question) – Kman Dec 08 '11 at 20:57
  • @Kman, you're matching the pattern `([^']|'')+` (without surrounding quotes). You should do as I showed you: `'([^']|'')+'` (including surrounding quotes) – Bart Kiers Dec 08 '11 at 21:01