0

Say input is String1OptionalString2WhatWeWant

Another kind of input is String1WhatWeWant

So I want to match WhatWeWant part, and first part should go to prefix.

However I cant seem to get this result.

Following regex doesn't produce desired effect

(?<=string1optionalstring2|string1)\w+

It still matches optionalstring2 while I don't what that. I assumed that it would prefer left full match ..

Robin
  • 9,415
  • 3
  • 34
  • 45
Valentin Kuzub
  • 11,703
  • 7
  • 56
  • 93

2 Answers2

1

I assume String1 is always present? Then:

(?:String1)(?:OptionalString2)?\w+
Kabie
  • 10,489
  • 1
  • 38
  • 45
  • String1 is always present, but this doesn't work when I run it in test environment.. does it work for you or you assume it will? – Valentin Kuzub Apr 04 '14 at 02:27
  • @ValentinKuzub: so what's the environment? And which language are you using? – Kabie Apr 04 '14 at 02:30
  • well i kinda asked first, does it work for you? that would be enough so I start to investigate why it doesn't for me. Language is C#, im using Expresso program to test out regexes if ur curious. – Valentin Kuzub Apr 04 '14 at 02:32
  • @ValentinKuzub: yes it should works in [Javascript](http://www.regexr.com/38lin) and [Ruby](http://rubular.com/r/mIqXghFgpT) – Kabie Apr 04 '14 at 02:36
  • oki, cool I'll investigate. Do you have an idea with prefix though? Why doesn't it work? – Valentin Kuzub Apr 04 '14 at 02:37
  • 2
    If you want to extract the final part (what's matched by `\w+`), you should put it in a capturing group: `(\w+)`. – Alan Moore Apr 04 '14 at 02:46
  • Marked as answer, but I still wonder how to make version with prefix work. – Valentin Kuzub Apr 05 '14 at 01:35
0

What happened

To understand why the lookbehind behave in a seemingly incoherent way, remember that the regex engine goes from left to right and returns the first match it finds.

Let's look at the steps it takes to match (?<=ab|a)\w+ on abc:

  • the engine starts at a. There isn't anything before, so the lookbehind fails
  • transmission kicks in, the engine is now considering a match starting from b
  • the lookbehind tries the first item of the alternation (ab) which fails
  • ... but the second item (a) matches
  • \w+ matches the rest of the string

The overall match is therefore bc, and the regex engine hasn't broken any of its rule in the process.

How to fix it

If C# supported the \K escape sequence, you could just use the greediness of ? to do the work for you (demo here):

string1(?:optionalstring2)?\K\w+

However, this (sadly) isn't the case. It therefore seems that you are stuck with using a capturing group:

string1(?:optionalstring2)?(\w+)
Community
  • 1
  • 1
Robin
  • 9,415
  • 3
  • 34
  • 45