Regex to find words that start or end with a particular letter

Question

Write a function called getWords(sentence, letter) that takes in a sentence and a single letter, and returns a list of the words that start or end with this letter, but not both, regardless of the letter case.

For example:

>>> s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> getWords(s, "t")
['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

My attempt:

regex = (r'[\w]*'+letter+r'[\w]*')
return (re.findall(regex,sentence,re.I))

My Output:

['The', 'TART', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'until', 'next']

Might help: http://stackoverflow.com/questions/247167/exclusive-or-in-regular-expression — SIGSTACKFAULT, Feb 19 '17 at 22:34
If you have already gotten responses, please do not modify your question in a way that invalidates answers. — TigerhawkT3, Feb 19 '17 at 22:44

Mark Tolonen · Answer 1 · 2017-02-19T23:07:19.823

\b detects word breaks. Verbose mode allows multi-line regexs and comments. Note that [^\W] is the same as \w, but to match \w except a certain letter, you need [^\W{letter}].

import re

def getWords(s,t):
    pattern = r'''(?ix)           # ignore case, verbose mode
                  \b{letter}      # start with letter
                  \w*             # zero or more additional word characters
                  [^{letter}\W]\b # ends with a word character that isn't letter
                  |               #    OR
                  \b[^{letter}\W] # does not start with a non-word character or letter
                  \w*             # zero or more additional word characters
                  {letter}\b      # ends with letter
                  '''.format(letter=t)
    return re.findall(pattern,s)

s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(s,'t'))

Output:

['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

Thank you Mark, it does solve my issue. I learned a new thing with .format. But I am lost in the expression, I understood \b{letter}, \w*. But what does i signifies in (?i) and why did you use \W in OR block. — Asheem, Feb 19 '17 at 22:52

Wasi Ahmad · Answer 2 · 2017-02-19T23:15:58.617

Doing this is much easy with the startswith() and endswith() method.

def getWords(s, letter):
    return ([word for word in mystring.split() if (word.lower().startswith('t') or 
                word.lower().endswith('t')) and not 
                    (word.lower().startswith('t') and word.lower().endswith('t'))])

mystring = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(mystring, 't'))

Output

['The', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']

Update (using regular expression)

import re
result1 = re.findall(r'\b[t]\w+|\w+[t]\b', mystring, re.I)
result2 = re.findall(r'\b[t]\w+[t]\b', mystring, re.I)
print([x for x in result1 if x not in result2])

Explanation

Regular expression \b[t]\w+ and \w+[t]\b finds words that start and ends with letter t and \b[t]\w+[t]\b finds words that both starts and ends with letter t.

After generating two lists of words, just take the intersection of those two lists.

I have updated my answer. I request to reconsider downvoting! — Wasi Ahmad, Feb 19 '17 at 22:41

trincot · Accepted Answer · 2017-02-19T22:52:49.703

2

It you want the regex for this, then use:

regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)

The replace is done to avoid the repeated verbose +letter+.

So the code looks like this then:

import re

def getWords(sentence, letter):
    regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)
    return re.findall(regex, sentence, re.I)

s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
result = getWords(s, "t")
print(result)

Output:

['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

Explanation

I have used # as a placeholder for the actual letter, and that will get replaced in the regular expression before it is actually used.

\b: word break
\w*: 0 or more letters (or underscores)
[^#\W]: a letter that is not # (the given letter)
|: logical OR. The left side matches words that start with the letter, but don't end with it, and the right side matches the opposite case.

edited Feb 19 '17 at 22:52

answered Feb 19 '17 at 22:44

trincot

317,000
35
244
286

Thank you very much I learned about .format() in one above answer and .replace() from yours. My question is what is the purpose of : [^#\W]|[^#\W] – Asheem Feb 19 '17 at 22:59
`[^#\W]` : Does it relate to one character or one word in the string? – Asheem Feb 19 '17 at 23:06
It relates to one character as there is no `+` or `*` following it. It says: "match one character that is not my letter and is not a non-letter either", which comes down to "match a letter that is different from my letter". The OR operator `|` extends further than that: it applies to the part between parentheses (or the whole pattern if there would have been none), so it says it should match either `#\w*[^#\W]` or `[^#\W]\w*#`. – trincot Feb 20 '17 at 06:17

TigerhawkT3 · Answer 4 · 2017-02-19T23:06:58.983

1

Why are you using regex for this? Just check the first and last character.

def getWords(s, letter):
    words = s.split()
    return [a for a,b in ((word, set(word.lower()[::len(word)-1])) for word in words) if letter in b and len(b)==2]

edited Feb 19 '17 at 23:06

answered Feb 19 '17 at 22:39

TigerhawkT3

48,464
6
60
97

I am asked to use regex only. – Asheem Feb 19 '17 at 22:41

rassar · Answer 5 · 2017-02-19T22:44:30.590

0

You can try the builtin startswith and endswith functions.

>>> string = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> [i for i in string.split() if i.lower().startswith('t') or i.lower().endswith('t')]
['The', 'TART', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']

edited Feb 19 '17 at 22:44

answered Feb 19 '17 at 22:39

rassar

5,412
3
25
41

Regex to find words that start or end with a particular letter

5 Answers5

Explanation