1

I have the same problem as the one in the link but in PCRE2 (regex101.com):

Regexp find a match with priority order?

I have:

/.*?(hello)|.*?(hey)|.*?(hi)/ms

hi hey hello

But the problem is that when it finds hello, it is stored in group 1, when it finds hey in group 2, and when it finds hi in group 3, I want only group 1 to be used instead.
How do I get this?

https://regex101.com/r/bc8XQE/1

Mario Palumbo
  • 693
  • 8
  • 32

1 Answers1

3

Using PCRE, instead of 3 different groups, you can use your pattern with the 3 alternatives and then make use of \K to forget what is matched so far.

The word boundary \b prevents a partial word match.

.*?\K\bhello\b|.*?\K\bhey\b|.*?\K\bhi\b

See a regex demo.

If you really must have group 1, then you can use a branch reset group:

(?|.*?\b(hello)\b|.*?\b(hey)\b|.*?\b(hi)\b)

See another regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    Yes, this solution is perfect. With only match is also better. – Mario Palumbo Jun 30 '23 at 08:50
  • I would need the `/ms` flags to use with grep in a form that looks like this: `grep -P -- '/.*?\K\bhello\b|.*?\K\bhey\b|.*?\K\bhi\b/ms'`, which of course does not give the desired result. How do I set the same flags I use on regex101 in grep on bash? – Mario Palumbo Jun 30 '23 at 09:18
  • @MarioPalumbo You don't need the forward slashes, and use `-o` to get only the match `grep -Po '.*?\K\bhello\b|.*?\K\bhey\b|.*?\K\bhi\b' file` – The fourth bird Jun 30 '23 at 09:31
  • in case there are newlines to separate the words rather than spaces, I must also use the `z` option and the `s` modifier: `grep -Pzo -- '(?s).*?\K\bhello\b|.*?\K\bhey\b|.*?\K\bhi\b'` but sadly the actual modifiers would be `gms` when all I need is `ms`. The `-m 1` option has no effect because of the `z` option and deleting the modifier only works for example with `(?-m)`, which leaves `gs` and with `(?-s) `, which leaves `gm` but not with `(?-g)` which oddly gives an error. The solution would be `(?s-g)` (add `s`, remove `g`) but it doesn't like the letter `g` at all. – Mario Palumbo Jun 30 '23 at 11:04
  • `m` is not needed in my case, you are right. The `s` flag even with the `-z` option strangely has the desired effect. The problem is just `g`. – Mario Palumbo Jun 30 '23 at 11:12
  • Your demo output shows: `hellohellohellohello` due of `g` modifier. So the underlying problem should be `grep only the first match and stop`, which is also an already existing topic on stackoverflow, which unfortunately didn't help me. – Mario Palumbo Jun 30 '23 at 11:35
  • @MarioPalumbo There is no `g` modifier, using `-o` returns all matches on the same line. You can get the first occurrence with `^.*?\K\bhello\b|^.*?\K\bhey\b|^.*?\K\bhi\b` – The fourth bird Jun 30 '23 at 11:39
  • `echo -e 'hi\nhey\nhello' | grep -Poz '^.*?\K(?:\bhello\b|.*?\K\bhey\b|.*?\K\bhi\b)'` returns `hi` and is not what I want. With the previous formula, without the `g` modifier, the problem would be solved. – Mario Palumbo Jun 30 '23 at 11:44
  • @MarioPalumbo `echo -e 'hi\nhey\nhello' | grep -Poz '(?s)(?:^.*?\K\bhello\b|^.*?\K\bhey\b|^.*?\K\bhi\b)'` now returns `hello` – The fourth bird Jun 30 '23 at 12:08
  • Yes, you are right, because with `^` it is forced to match one alternative at a time because each of them starts from the beginning. best idea. This `(?:)` is not necessary. – Mario Palumbo Jun 30 '23 at 12:42
  • @MarioPalumbo You are right, you can omit `(?:` – The fourth bird Jun 30 '23 at 12:46