1

Write a function called getWords(sentence, letter) that takes in a sentence and a single letter, and returns a list of the words that start or end with this letter, but not both, regardless of the letter case.

For example:

>>> s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> getWords(s, "t")
['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

My attempt:

regex = (r'[\w]*'+letter+r'[\w]*')
return (re.findall(regex,sentence,re.I))

My Output:

['The', 'TART', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'until', 'next']
TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97
Asheem
  • 135
  • 1
  • 2
  • 8

5 Answers5

5

\b detects word breaks. Verbose mode allows multi-line regexs and comments. Note that [^\W] is the same as \w, but to match \w except a certain letter, you need [^\W{letter}].

import re

def getWords(s,t):
    pattern = r'''(?ix)           # ignore case, verbose mode
                  \b{letter}      # start with letter
                  \w*             # zero or more additional word characters
                  [^{letter}\W]\b # ends with a word character that isn't letter
                  |               #    OR
                  \b[^{letter}\W] # does not start with a non-word character or letter
                  \w*             # zero or more additional word characters
                  {letter}\b      # ends with letter
                  '''.format(letter=t)
    return re.findall(pattern,s)

s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(s,'t'))

Output:

['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • Thank you Mark, it does solve my issue. I learned a new thing with .format. But I am lost in the expression, I understood \b{letter}, \w*. But what does i signifies in (?i) and why did you use \W in OR block. – Asheem Feb 19 '17 at 22:52
  • @AsheemChhetri Added comments. – Mark Tolonen Feb 19 '17 at 22:58
3

Doing this is much easy with the startswith() and endswith() method.

def getWords(s, letter):
    return ([word for word in mystring.split() if (word.lower().startswith('t') or 
                word.lower().endswith('t')) and not 
                    (word.lower().startswith('t') and word.lower().endswith('t'))])

mystring = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
print(getWords(mystring, 't'))

Output

['The', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']

Update (using regular expression)

import re
result1 = re.findall(r'\b[t]\w+|\w+[t]\b', mystring, re.I)
result2 = re.findall(r'\b[t]\w+[t]\b', mystring, re.I)
print([x for x in result1 if x not in result2])

Explanation

Regular expression \b[t]\w+ and \w+[t]\b finds words that start and ends with letter t and \b[t]\w+[t]\b finds words that both starts and ends with letter t.

After generating two lists of words, just take the intersection of those two lists.

Wasi Ahmad
  • 35,739
  • 32
  • 114
  • 161
2

It you want the regex for this, then use:

regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)

The replace is done to avoid the repeated verbose +letter+.

So the code looks like this then:

import re

def getWords(sentence, letter):
    regex = r'\b(#\w*[^#\W]|[^#\W]\w*#)\b'.replace('#', letter)
    return re.findall(regex, sentence, re.I)

s = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
result = getWords(s, "t")
print(result)

Output:

['The', 'Tuesdays', 'Thursdays', 'but', 'it', 'not', 'start', 'next']

Explanation

I have used # as a placeholder for the actual letter, and that will get replaced in the regular expression before it is actually used.

  • \b: word break
  • \w*: 0 or more letters (or underscores)
  • [^#\W]: a letter that is not # (the given letter)
  • |: logical OR. The left side matches words that start with the letter, but don't end with it, and the right side matches the opposite case.
trincot
  • 317,000
  • 35
  • 244
  • 286
  • Thank you very much I learned about .format() in one above answer and .replace() from yours. My question is what is the purpose of : [^#\W]|[^#\W] – Asheem Feb 19 '17 at 22:59
  • `[^#\W]` : Does it relate to one character or one word in the string? – Asheem Feb 19 '17 at 23:06
  • It relates to one character as there is no `+` or `*` following it. It says: "match one character that is not my letter and is not a non-letter either", which comes down to "match a letter that is different from my letter". The OR operator `|` extends further than that: it applies to the part between parentheses (or the whole pattern if there would have been none), so it says it should match either `#\w*[^#\W]` or `[^#\W]\w*#`. – trincot Feb 20 '17 at 06:17
1

Why are you using regex for this? Just check the first and last character.

def getWords(s, letter):
    words = s.split()
    return [a for a,b in ((word, set(word.lower()[::len(word)-1])) for word in words) if letter in b and len(b)==2]
TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97
0

You can try the builtin startswith and endswith functions.

>>> string = "The TART program runs on Tuesdays and Thursdays, but it does not start until next week."
>>> [i for i in string.split() if i.lower().startswith('t') or i.lower().endswith('t')]
['The', 'TART', 'Tuesdays', 'Thursdays,', 'but', 'it', 'not', 'start', 'next']
rassar
  • 5,412
  • 3
  • 25
  • 41