find words of length 4 using regular expression

Question

I am trying to find words in regular expression with length 4

I am trying this but I am getting an empty list:

#words that have length of 4
s = input("please enter an expression: ")
print(re.findall(r'/^[a-zA-Z]{4}$/',s))

What is wrong with my code ?

my input is: here we are having fun these days

my expected output: ['here', 'days']

my output: []

By adding `^` and `$`, you're requiring that the *entire* string be a word of length 4, not finding all words within it of length 4. — David Robinson, Apr 17 '15 at 03:06

Avinash Raj · Accepted Answer · 2015-04-17T03:27:02.040

12

Use word boundaries \b. When you add anchors in your regex like ^[a-zA-Z]{4}$, this would match the lines which have only four alphabets. It won't check for each individual words. ^ asserts that we are at the start and $ asserts that we are at the end. \b matches between a word character and a non-word character(vice versa). So it matches the start (zero width) of a word or end (zero width) of a word.

>>> s = "here we are having fun these days"
>>> re.findall(r'\b[a-zA-Z]{4}\b', s)
['here', 'days']

edited Apr 17 '15 at 03:27

answered Apr 17 '15 at 03:07

Avinash Raj

172,303
28
230
274

It may be useful to mention that `^` matches the start of the string and `$` the end to show that you needed to remove these to match substrings that are arbitrarily placed in the input. – meiamsome Apr 17 '15 at 03:10
Or you could use this `re.findall(r'(?<!\S)[a-zA-Z]{4}(?!\S)', s)` also. – Avinash Raj Apr 17 '15 at 03:23
I recommend reading the following post [Regex to match words of a certain length.](http://stackoverflow.com/questions/9043820/regex-to-match-words-of-a-certain-length) – Jose Ricardo Bustos M. Apr 17 '15 at 03:33
@Avinash Raj \b[a-zA-Z]{4}\b ..... works if the word is at the beginning or end ? – Jose Ricardo Bustos M. Apr 17 '15 at 03:34
1

@JoseRicardoBustosM. yep. Because start and end was not a word character. – Avinash Raj Apr 17 '15 at 03:36

A.J. Uppal · Answer 2 · 2015-04-17T03:43:33.023

1

No need for a (possibly) complicated regex, you can just use a list comprehension:

>>> s = "here we are having fun these days"
>>> [word for word in s.split() if len(word) == 4 and word.isalpha()]
['here', 'days']
>>>

edited Apr 17 '15 at 03:43

answered Apr 17 '15 at 03:31

A.J. Uppal

19,117
6
45
76

it must be `[word for word in s.split() if len(word) == 4 and word.isalpha()]` – Avinash Raj Apr 17 '15 at 03:35
1

This is fine, but this is likely to be generally much slower than the regexp solution. The regexp is really not complicated, as far as regular expressions go (and regular expressions are too useful to not be learned). – Eric O. Lebigot Apr 17 '15 at 03:47

find words of length 4 using regular expression

2 Answers2

Linked