1

Pseudo/dummy-code that will be matched against:

RECOVERY: 'XXXXXXXXX' is UP
PROBLEM: 'ABABABAB' on 'XXXXXXXXX' is WARNING
PROBLEM: 'XXXXXXXXX' is DOWN
RECOVERY: 'ABABABAB' on 'XXXXXXXXX' is OK
PROBLEM: 'ABABABAB' on 'XXXXXXXXX' is DOWN

Goal

Capture XXXXXXXXX(without the single-quotes) but do NOT capture ABABABAB

Best attempt so far:

(M: \'|Y: \')(.*)(?:\' )(?:is)

Is it even possible to achive the goal above, and if yes, then how?

Anton Flärd
  • 303
  • 3
  • 15

2 Answers2

2

You can use a lookahead only to check if the string matched is before is:

'([^']*)'\\s*(?=\\bis\\b)

See regex demo

Breakdown:

  • ' - single apostrophe
  • ([^']*) - capture group matching 0 or more characters other than '
  • '\\s* - a single apostrophe and 0 or more whitespace symbols
  • (?=\\bis\\b) - a lookahead making sure there is a whole word is after the current position (after the ' with optional whitespaces)

Java demo:

Pattern ptrn = Pattern.compile("'([^']*)'\\s*(?=\\bis\\b)");
Matcher matcher = ptrn.matcher("RECOVERY: 'XXXXXXXXX' is UP");
if (matcher.find()) {
    System.out.println(matcher.group(1));
}

UPDATE

I used a lookahead only because you used a non-capturing group in your original regex : (?:is). A non-capturing group that has no quantifier set or any alternation inside seems reduntant and can be omitted. However, people often get misled by the name non-capturing thinking they can exclude the substring matched by this group from the overall match. To check for presence or absence of some text without matching, a lookaround should be used. Thus, I used a lookahead.

Indeed, in the current scenario, there is no need in a lookahead since it makes sense in case you need to match subsequent substrings that start wiyh the same sequence of characters.

So, a better alternative would be

'([^']*)'\s*is\b

Java:

Pattern ptrn = Pattern.compile("'([^']*)'\\s*is\\b");
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Why bother using a lookahead? You're using a capturing group to extract the part you want, so what does it matter if the `is` is consumed? – Alan Moore Dec 02 '15 at 00:11
  • @AlanMoore What is your solution? Im note sure to understand.. Isnt the lookahead the better solution hère? Since there could also be a ABABAB instead of XXXXXX – Yassin Hajaj Dec 02 '15 at 00:22
  • @YassinHajaj: I'm not saying you shouldn't check for `is` after the quoted word, just that you don't need to use a lookahead. Copy [NEO-xx's regex](http://stackoverflow.com/a/34032684/20938) into the code above and you'll get exactly the same result. The lookahead and the word boundaries are both unnecessary complications. – Alan Moore Dec 02 '15 at 00:53
  • @AlanMoore Ok I get it, Im new to RegEx sorry. So hes capturing a group for nothing if I get it right when he does not need to capture it? Speaking about the "is" group of course – Yassin Hajaj Dec 02 '15 at 01:04
  • @YassinHajaj: I added my explanation why I chose that lookahead. Also, I posted my answer at midnight and went straight to bed, I shouldn't have done that, perhaps (both :)). – Wiktor Stribiżew Dec 02 '15 at 08:03
2

Following regex should work

\'([^']+)\'\s+is

all the matches will be stored in matcher groups array

ashishmohite
  • 1,120
  • 6
  • 14