How to pass an argument for regex repetitions (python)

Question

I'm learning about regex. If I want to find all the 5 letter words in a string, I could use:

import re
text = 'The quick brown fox jumps over the lazy dog.'
print(re.findall(r"\b[a-zA-z]{5}\b", text))

But I want to write a simple function whose argument includes the string and the length of the word being found. I tried this:

import re
def findwords(text, n):
    return re.findall(r"\b[a-zA-z]{n}\b", text)    

print(findwords('The quick brown fox jumps over the lazy dog.', 5))

But this returns an empty list. The n is not being recognized.

How can I specify an argument with the number of repetitions (or in this case, the length of the word)?

Possible duplicate of [How do I put a variable inside a String in Python?](https://stackoverflow.com/questions/2960772/how-do-i-put-a-variable-inside-a-string-in-python) — Aran-Fey, Mar 20 '18 at 21:03

Graipher · Answer 1 · 2018-03-20T21:37:50.117

5

Python does not magically fill the value of n into the string. For this you either need to use format:

r"\b[a-zA-z]{{{}}}\b".format(n)

or, if you are running Python >= 3.6, use the new f-strings (which can be combined with the r prefix denoting a raw string):

fr"\b[a-zA-z]{{{n}}}\b"

In both cases you need the outer two {{}} to create a literal {} and the inner is a format placeholder.

If you want to avoid having to escape the literal {}, you can use the older %-formatting to achieve the same thing. For this n needs to always be an integer (which it is here):

r"\b[a-zA-z]{%i}\b" % n

edited Mar 20 '18 at 21:37

answered Mar 20 '18 at 20:47

Graipher

6,891
27
47

This explains a lot. I see now how to use fr. But would the use of 6 brackets be clean enough python? Is it something you'd see in professional programming? – cDub Mar 20 '18 at 21:16
@Christy Yes, I think so. There is always the alternative of using `%` formatting in that case, though: `r"\b[a-zA-z]{%i}\b" % n`. – Graipher Mar 20 '18 at 21:36

score 4 · Answer 2 · answered Mar 20 '18 at 20:45

It's simpler than you may realize. There is nothing special about a "regex string": it is a simple, basic, everyday text string. About the only thing remotely remarkable is that it is usually defined with the r prefix, because the backslash means something in (unprefixed) Python strings as well, and you don't want to double up these, and ... it is fed as-is into Python's internal regex module.

So where the string comes from, doesn't really matter! Construct it any way you like, then feed the result into re.findall:

def findwords(text, n):
    return re.findall(r"\b[a-zA-z]{" +str(n) + r"}\b", text)

>>> findwords(text, 3)
['The', 'fox', 'the', 'dog']
>>> findwords(text, 4)
['over', 'lazy']

Note the repeated use of r, because it is not a regex peculiarity but a Python one, and you need to prefix all separate strings with it to prevent backslashes running rampant and messing up your carefully constructed expression.

(The same goes for the input to this function. This will also work, unless you test the argument and reject non-numbers:

>>> findwords(text, '5} {1')
['quick ', 'brown ', 'jumps ']

... which I did not.)

Still working on understanding; why would we change n into a string if it represents a length? — cDub, Mar 20 '18 at 21:19
@Christy Because `"a" + 5` is not defined in Python, whereas `"a" + str(5) == "a5"`. — Graipher, Mar 20 '18 at 21:39
@Christy: don't forget that a regex argument is still a *string*. There are no 'numbers' in it. The regex parser is responsible for recognizing any numbers as such, not Python. — Jongware, Mar 20 '18 at 22:25

score 2 · Answer 3 · answered Mar 20 '18 at 21:00

This can be done very easily without generating a regex pattern. Just simply extract all words and then use list comprehension to gather all words of length n.

See code in use here

import re

text = 'The quick brown fox jumps over the lazy dog.'
words = re.findall(r"[a-zA-Z]+", text)

print([w for w in words if len(w) == 3])

Result: ['The', 'fox', 'the', 'dog']

How to pass an argument for regex repetitions (python)

3 Answers3