2

I have to use a regular expression to match several strings and capture parts of the string.

Example strings could look like:


  • robert eric palmer sent for the boat
  • robert eric william palmer sent for the boat

The goal is to lazy match and capture the middle name of robert palmer up to the point where the surname (palmer) appears in the string AND ensure the rest of the string matches the static text (robert ___ palmer sent for the boat).

I have used a positive lookahead to find the middle name and stop matching if palmer is found:

/robert (.+?)(?=\spalmer) palmer/

which correctly matches;

robert eric palmer

robert eric william palmer

and correctly doesn't match;

robert eric william palmer palmer


The problem:

when I add the rest of the static text to the regex;

/robert (.+?)(?=\spalmer) palmer sent for the boat/

it incorrectly matches;

robert eric william palmer palmer sent for the boat
robert eric palmer palmer sent for the boat

How can I lazy match up to palmer for the middle name and still assert the rest of the static text matches?

I hope this makes sense!

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Norix
  • 33
  • 5
  • Maybe all you need is `robert (.+?) palmer sent for the boat`? The `(?=\spalmer)` is redundant in `robert (.+?)(?=\spalmer) palmer sent for the boat` – Wiktor Stribiżew Dec 15 '19 at 00:34
  • i need it to stop matching if palmer exists in between “robert” and “palmer” though, so “robert eric palmer” is fine but “robert eric palmer palmer” shouldn’t match. – Norix Dec 15 '19 at 00:39
  • Then, the third example - `robert eric palmer palmer sent for the boat` - should be no match and you want `robert ((?:(?!palmer).)+?) palmer sent for the boat` – Wiktor Stribiżew Dec 15 '19 at 01:21
  • this seems to work as required but i’m not sure how? would you mind breaking it down? – Norix Dec 15 '19 at 13:50
  • To be clear, your requirement is "match all words after robert and before [the first palmer], if that is followed by this string." Your current regex is "match all words after robert and before the first [palmer that is followed by this string]". It's close, but not exactly the same. Wiktor gave a good regex answer for this. – justhalf Dec 15 '19 at 14:51
  • @WiktorStribiżew Please don't edit the question just to fit your answer better. This should be done by the OP. I rolled back to the original question and think you're on the wrong track here. I think OP wants to match `eric` in `robert eric palmer palmer sent for the boat`. Only Norix can clarify that. – bobble bubble Dec 16 '19 at 10:39

3 Answers3

2

You may use

robert ((?:(?!palmer).)+?) palmer sent for the boat

See the regex demo.

Details

  • robert - a literal substring
  • ((?:(?!palmer).)+?) - a capturing group #1 with a tempered greedy token that matches any char (.), 1 or more occurrences but as few as possible, that does not start a palmer char sequence
  • palmer sent for the boat - a literal substring.

To unroll the pattern for better performance use

robert ([^p]*(?:p(?!almer)[^p]*)*) palmer sent for the boat

See this regex demo.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Nice answer +1. Also in this case the lazy +? does not do anything useful, I believe it should be the same as the greedy one. – justhalf Dec 15 '19 at 14:53
  • @justhalf I'd rather stick to a lazy quantifier here, as the text between `robert` and `palmer` should not be longer than the rest of the string. Maybe it makes sense to add `robert` there, `(?:(?!palmer|robert).)+?` – Wiktor Stribiżew Dec 15 '19 at 15:01
  • @justhalf thank you for you answer above but I think the answer from Wiktor is exactly what I needed here. Thanks both for taking the time to look at this and Wiktor, thank you for breaking down the answer. I really appreciate it. – Norix Dec 15 '19 at 15:48
  • I mean, due to the negative lookahead, the regex with or without the lazy quantifier match the same set of string. Because the capture group can only match exactly before the first palmer in either case. I like the idea of unrolling it too! – justhalf Dec 16 '19 at 02:30
1

As already mentioned, the lookahead in your sample is unneeded. If you want to lazily match the part until palmer with optional palmer and a specified substring after it, add it to the pattern.

robert (.+?) palmer(?:.* palmer)? sent for the boat

The optional greedy (?:.* palmer)? will consume the gap between lazy part and sent for the boat.

See this demo at regex101   (?:opens a non capturing group)


For just consecutive palmer after, an idea to use robert (.+?) (?:palmer )+sent for the boat

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
0

What about using a greedy match instead? For example:

robert (.+) palmer

Otherwise it potentially would leave at the first occurrence of palmer instead of the last. Example here.

enter image description here

David542
  • 104,438
  • 178
  • 489
  • 842