1

As part of some lemmatization rules, I'm trying to form a regular expression that will match strings ending in 'ses', 'zes', 'xes', 'ches' or 'shes' and I'm having difficulty getting the letter groupings correct. I thought the following would work...

re.fullmatch(r'.*[szx(ch)(sh)]es\b', infl)

but I see that this will match 'ces' or 'hes word endings where I only want it to match 'ches' word endings (same for the (sh) grouping). I must be misunderstanding how to 'or' together groups correctly. Whenever I put a bracket around a set of groups I match all letters inside, not just the letter combos.

How can I reform the fullmatch expression to work correctly? I must be misunderstanding how combining groupings work so a short explanation of that, in this context, would also be helpful.

BTW... I only need a true/false return. I'm not interested in the returned values.

Some matching examples are: dismisses, waltzes, indexes, detaches, distinguishes.

bivouac0
  • 2,494
  • 1
  • 13
  • 28

1 Answers1

3

Your regex does not work correctly even in Java as groupings are not supported inside character classes. The ( and ) are treated as literal parentheses inside [...].

The fullmatch requires a full string match, and if you do not care what was at the start just use re.search and anchor the pattern at the end.

Use

re.search(r'(?:[zx]|ch|sh?)es$', s)

See the regex demo and a Regulex graph:

enter image description here

Details

  • (?:[zx]|ch|sh?) - a non-capturing group matching either of
    • [zx] - z or x
    • | - or
    • ch - ch char sequence
    • | - or
    • sh? - s or sh
  • es - es substring
  • $ - end of string.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Why ?: Without remembering sign is needed here ? – srknzl Apr 18 '19 at 13:46
  • @srknzl It is best practice to use a non-capturing group when you do not need to access the group value after matching. – Wiktor Stribiżew Apr 18 '19 at 13:47
  • This rules should not match strings with 'ess" on the end. That's a different lemmatization rule. – bivouac0 Apr 18 '19 at 13:54
  • Can I ask whether or not python and JavaScript regex are using the same syntax? At regulex graph site it says JavaScript regex visualizer at the top. Is it for python? Thanks – srknzl Apr 18 '19 at 13:55
  • @srknzl Those are different regex engines but this exact pattern will work the same in both languages. – Wiktor Stribiżew Apr 18 '19 at 13:57
  • @bivouac0 The regex [does not match](https://regex101.com/r/dljlaP/4) strings ending with `ess`. Your `r'.*[szx(ch)(sh)]es\b'` pattern (that you say works in Java, although it does not) does not contain any quotation marks. Please share a regex fiddle showing your real problem. – Wiktor Stribiżew Apr 18 '19 at 13:58
  • @SanV There are no quotes in the original pattern. – Wiktor Stribiżew Apr 18 '19 at 13:59
  • @WiktorStribiżew fair enough. respect to Wiktor, the wizard / regexp.ert :) – SanV Apr 18 '19 at 14:04
  • @WiktorStribiżew It's possible the java version has some issues. It's from code that I assumed worked but maybe it's bugged or I'm misunderstanding what they intended. I'll edit the post to remove the java comment. – bivouac0 Apr 18 '19 at 14:05
  • @bivouac0 Better add the matching and non-matching examples. – Wiktor Stribiżew Apr 18 '19 at 14:07