2

Given that you can reference a capture group in a regex pattern, is it possible to use said capture group in a lookbehind?

if you have the string

"monkeys eat bananas, bananas are terrified of monkeys"

bananas is the first matched pair, while monkeys is the first word that has a match. I can get monkeys without any issue

(\w+).*\1 # returns monkeys

But if I want to get the word which matches first I would need to be able to do something like this

(?<=\1)(\w+)

However, this fails, and I would guess for the simple reason that when the lookbehind is evaluated, \1 means nothing. Is there some more regex magic that I have not come across yet, that would allow me to match something like this?

miah
  • 10,093
  • 3
  • 21
  • 32
  • What exactly are you expecting by matching first? – hwnd Sep 08 '13 at 23:52
  • I'm trying to get the first matching pair, so in my example bananas gets repeated before monkeys, so bananas is returned. – miah Sep 09 '13 at 00:15

1 Answers1

1

Many regular expression engines require backreferences to appear after the group which they reference (see my related question about this behavior in .NET)

Try using a lookahead instead:

(\w+)(?=.*\1)
Community
  • 1
  • 1
p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
  • Isn't this the equivalent of `(\w+).*\1`? i.e. `(\w+)` matches monkeys, and then the `lookahead` see that monkeys is repeated? – miah Sep 09 '13 at 00:21
  • 1
    @miah Not precisely. In `(\w).*\1` the whole match consists of the entire string between both instances (with only the first instance captured in a group). Using the lookahead, the whole match consists of just the first instance. That means that given your above sample, this pattern would match both `monkeys` and `bananas` (yours would match only `monkeys`). – p.s.w.g Sep 09 '13 at 00:28