Python regexp to match full or partial word

Question

Is there a way to get regexp to match as much of a specific word as is possible? For example, if I am looking for the following words: yesterday, today, tomorrow

I want the following full words to be extracted:

yest

yesterday

tod

toda

today

tom

tomor

tomorrow

The following whole words should fail to match (basically, spelling mistakes):

yesteray

tomorow

tommorrow

tody

The best I could come up with so far is:

\b((tod(a(y)?)?)|(tom(o(r(r(o(w)?)?)?)?)?)|(yest(e(r(d(a(y)?)?)?)?)?))\b (Example)

Note: I could implement this using a finite state machine but thought it would be a giggle to get regexp to do this. Unfortunately, anything I come up with is ridiculously complex and I'm hoping that I've just missed something.

Show us what you've tried so that we can suggest the use cases you'd have missed. — CinCout, Dec 31 '15 at 05:00
@Martin - because "yes" is another word that I might be looking for — dkf, Dec 31 '15 at 05:36

Wiktor Stribiżew · Accepted Answer · 2016-01-02T09:12:30.800

The regex you are looking for should include optional groups with alternations.

\b(yest(?:e(?:r(?:d(?:ay?)?)?)?)?|tod(?:ay?)?|tom(?:o(?:r(?:r(?:ow?)?)?)?)?)\b

See demo

Note that \b word boundaries are very important since you want to match whole words only.

Regex explanation:

\b - leading word boundary
(yest(?:e(?:r(?:d(?:ay?)?)?)?)?|tod(?:ay?)?|tom(?:o(?:r(?:r(?:o(?:w)?)?)?)?)?) - a capturing group matching
- yest(?:e(?:r(?:d(?:ay?)?)?)?)? - yest, yeste, yester, yesterd, yesterda or yesterday
- tod(?:ay?)? - tod or toda or today
- tom(?:o(?:r(?:r(?:o(?:w)?)?)?)?)? - tom, tomo, tomor, tomorr, tomorro, or tomorrow
\b - trailing word boundary

See Python demo:

import re
p = re.compile(ur'\b(yest(?:e(?:r(?:d(?:ay?)?)?)?)?|tod(?:ay?)?|tom(?:o(?:r(?:r(?:ow?)?)?)?)?)\b', re.IGNORECASE)
test_str = u"yest\nyeste\nyester\nyesterd\nyesterda\nyesterday\ntod\ntoda\ntoday\ntom\ntomo\ntomor\ntomorr\ntomorro\ntomorrow\n\nyesteray\ntomorow\ntommorrow\ntody\nyesteday"
print(p.findall(test_str))
# => [u'yest', u'yeste', u'yester', u'yesterd', u'yesterda', u'yesterday', u'tod', u'toda', u'today', u'tom', u'tomo', u'tomor', u'tomorr', u'tomorro', u'tomorrow']

I like your solution for today, very clean. The solutions for yest(erday) and tom(orrow) are incomplete. If you look at the one that I came up with, it covers every permutation (minus the case insensitivity) of yest* and tom*; that's what I'm looking for. Sorry, I thought I'd made that clear. — dkf, Jan 02 '16 at 06:13
hmm... looking at your syntax, though, i like that the word permutations are using non-capturing groups, which is where mine produces ugly results. Yours could be expanded to `\b(tom(?:o(?:r(?:r(?:o(?:w)?)?)?)?)?)\b`, for example... It's an ugly re but it is cleaner than mine. — dkf, Jan 02 '16 at 06:23

Santanu Dey · Answer 2 · 2015-12-31T05:09:26.260

0

Pipe separate all the valid words or word substrings like below. This will only match the valid spellings as desired

^(?|yest|yesterday|tod|today)\b

Tested this already at https://regex101.com/

edited Dec 31 '15 at 05:09

answered Dec 31 '15 at 05:04

Santanu Dey

2,900
3
24
38

This is a PCRE regex. The question is related to Python. – Wiktor Stribiżew Dec 31 '15 at 09:43
I see. There is a nice input on PCRE regex support on python here. http://stackoverflow.com/questions/16940150/how-can-i-use-pcre-regexes-from-a-python-script – Santanu Dey Dec 31 '15 at 09:58
That is not PCRE, **PyPi regex** module is even cooler :) It really is a combination of all the best features from PCRE and .NET regex flavors. – Wiktor Stribiżew Dec 31 '15 at 09:59

Python regexp to match full or partial word

2 Answers2