Finding all occurrences of a word in a string in Python3

Question

I am trying to find all words containing "hell" in 1 sentence. There are 3 occurrences in the below string. But re.search is returning only the first 2 occurrences. I tried both findall and search. Can someone please tell me what is wrong here ?

>>> s = 'heller pond hell hellyi'
>>> m = re.findall('(hell)\S*', s)
>>> m.group(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'group'
>>> m = re.search('(hell)\S*', s)
>>> m.group(0)
'heller'
>>> m.group(1)
'hell'
>>> m.group(2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: no such group
>>>

score 5 · Accepted Answer · answered Feb 06 '15 at 17:42

5

You can use re.findall and search for hell with zero or more word characters on either side:

>>> import re
>>> s = 'heller pond hell hellyi'
>>> re.findall('\w*hell\w*', s)
['heller', 'hell', 'hellyi']
>>>

answered Feb 06 '15 at 17:42

This one works perfect. Do you know why this is wrong ? >>> m = re.search('(hell)\S*', s). I returns only the first 2 occurances. "hellyi" is not returned. – Vin Feb 06 '15 at 17:53
No, it is not returning two occurrences. `re.search` only gets the first. You are getting `hell` because that is the value matched by the capture group. It is still a part of `heller` though. – Feb 06 '15 at 17:59

score 2 · Answer 2 · answered Feb 06 '15 at 17:39

2

You can use str.split and see if the substring is in each word:

s = 'heller pond hell hellyi'

print([w for w in s.split() if "hell" in w])

answered Feb 06 '15 at 17:39

Padraic Cunningham

176,452
29
245
321

Thanks ! Is there a way to do it via regular expressions ? I am still learning it and really want to try it using RE. – Vin Feb 06 '15 at 17:40
@Vin, yes Icodez answer simply using findall will work. If you don't care about efficiency use a regex – Padraic Cunningham Feb 06 '15 at 17:45

Adam Smith · Answer 3 · 2015-02-06T17:47:27.150

Your regex isn't finding hell because you're only matching hell that precedes some other non-space character. Instead just look for a literal hell -- nothing fancy.

In [3]: re.findall('hell', 'heller pond hell hellyi')
Out[3]: ['hell', 'hell', 'hell']

EDIT

Per your comment, you want to return the whole word if it's found in the middle of the word. In which case you should use the * zero-or-or more quantifier.

In [4]: re.findall(r"\S*hell\S*", 'heller pond hell hellyi')
Out[4]: ['heller', 'hell', 'hellyi']

In other words:

re.compile(r"""
    \S*          # zero or more non-space characters
    hell         # followed by a literal hell
    \S*          # followed by zero or more non-space characters""", re.X)

Note that Padraic's answer is definitely the BEST way to go about this:

[word for word in "heller pond hell hellyi".split() if 'hell' in word]

But I want it to return "heller", "hell", "hellyi". So I have to give \S or some other escape char. — Vin, Feb 06 '15 at 17:44

score 0 · Answer 4 · answered Aug 12 '15 at 20:07

0

Maybe it's me but i use regex very little. Python3 has extensive text functions, what is wrong with using the build-in function ?

'heller pond hell hellyi'.count('hell')

The only drawback i see is that this way i never really learn to use regex. :-)

answered Aug 12 '15 at 20:07

henkidefix

139
4

Finding all occurrences of a word in a string in Python3

4 Answers4