2

I have a bit of RegEx I am trying to figure out: ( of [A-Za-z ]+)?

The above portion of my Regex will match the following:

of New Mexico and Mrs Smith.

What I am trying to do is have the RegEx stop before and.

( of [A-Za-z ]+)\sand?

The above RegEx is very close to solving the issue however it still matches and.

The above matches:

of New Mexico and

I want it to output:

of New Mexico

CodeHard
  • 125
  • 14
  • What **exactly** are you trying to do by stopping before `and` ? I mean there should be some logical pattern here that regex could match. –  Apr 13 '16 at 13:13
  • do you mean with this specific text or in general? what are the expected inputs? – Pedru Apr 13 '16 at 13:14
  • Is this what you want ? `of\s([A-Za-z ]+)\sand?` –  Apr 13 '16 at 13:17
  • I don't want and or anything after to be matched. The inputs vary and sometimes AND will not even be in the text. For example: Mark of New Mexico. "of New Mexico" will match. Mark of New Mexico and Tom of West Virginia. will also match. I want to stop the match before "and" – CodeHard Apr 13 '16 at 13:17
  • Please edit your post and put: A test case with an input phrase and the output you'd like to see –  Apr 13 '16 at 13:19
  • Dex'ter that is VERY close but still matches and – CodeHard Apr 13 '16 at 13:20
  • @Dex'ter: it will match `of New Mexico anything`. `and?` does not do what you are suggesting it does. – Jongware Apr 13 '16 at 13:46
  • You could do it this way: `(.*)\sand`. [DEMO](https://regex101.com/r/nQ4gA3/2) – Quinn Apr 13 '16 at 14:39

1 Answers1

5

You can use a tempered greedy token:

( of (?:(?!\band\b)[A-Za-z ])+)?
     ^^^^^^^^^^^^^^^^^^^^^^^^^

See the regex demo

The (?:(?!\band\b)[A-Za-z ])+ construct matches 1+ characters defined in the [A-Za-z ] character class that are not a whole word and.

Python demo:

import re
p = re.compile(r'( of (?:(?!\band\b)[A-Za-z ])+)?')
s = " of New Mexico and Mrs Smith."
m = p.search(s)
if m:
    print(m.group().strip())
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • 2
    You are welcome. Note that the tempered greedy token can be considered a multicharacter "synonym" for a negated character class. You cannot use `[^and]+` since it would just fail to match individual characters `a`, `n` and `d`. When using the tempered greedy token, you disallow matching the *sequence* of characters. – Wiktor Stribiżew Apr 13 '16 at 13:32