-1

I'm brushing up on my RegEx skills in Python and trying to do some practice problems. The prompt requested that I find the words with exactly 8 letters from the below string

import re

str='''Au pays parfume que le soleil caresse,
J'ai connu, sous un dais d'arbres tout empourpres
Et de palmiers d'ou pleut sur les yeux la paresse,
Une dame creole aux charmes ignores.'''

Here is prescribed the solution:

regex = r'\w{8}'

emails=re.findall(regex, str)

print(emails)

This returns back ['empourpr', 'palmiers']. However, 'empourpr' is not a separate word but part of a larger string ('empourpres'). Is there a reason that RegEx seems to pulling in something that doesn't fit the alphanumeric length combo used in the re.findall? Also are there any best practices to avoid something like this? Thanks!

shaik moeed
  • 5,300
  • 1
  • 18
  • 54
MJP
  • 71
  • 1
  • 1
  • 4

1 Answers1

-1

You might use word boundary to avoid getting contacts at parts of longer words, consider following simple example

import re
text = "quick fox jumps over the lazy dog"
print(re.findall(r'\b\w{3}\b',text))

gives output

['fox', 'the', 'dog']
Daweo
  • 31,313
  • 3
  • 12
  • 25
  • Thanks. This is what I was looking for, wasn't aware of the word boundary syntax. – MJP Jul 27 '23 at 19:26