0

I want to write a regex that orders python to return items in a list that have sequence of vowels, defined by len=2.

>>> chars = "aeiou"
>>> len = 2
>>> regex = re.compile(r"[+{}+]{{len}}",format(chars))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 234, in compile
    return _compile(pattern, flags)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/re.py", line 286, in _compile
    p = sre_compile.compile(pattern, flags)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/sre_compile.py", line 764, in compile
    p = sre_parse.parse(p, flags)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/sre_parse.py", line 930, in parse
    p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
TypeError: unsupported operand type(s) for &: 'str' and 'int'
>>> 
>>> def funct(regex,list):
...     for item in list:
...         if regex.search(item):
...             print(item)
... 
>>> list = ['avid','Chaos','st','Cy']
>>> 
>>> funct(regex,list)
avid
Chaos

I should be only getting Chaos, not avid. I'm having trouble understanding inputting len parameter into the re.compile module.

halo09876
  • 2,725
  • 12
  • 51
  • 71

2 Answers2

3

Your misuse of formatting has nothing to do with regular expressions. It appears that on top of everything else, you are incorrectly trying to use an f-string along with formatting. Among other things, you need to prefix an f-string with f and you can invoke methods with a period, not a comma.

The two formatting operations are interchangeable, and have a clearly defined evaluation order (f-string, then format method). However, it is generally better to use one or the other, not both. Things get unnecessarily complicated otherwise.

Using f-strings:

regex = re.compile(f"[{chars}]{{{len}}}")

Double braces are interpreted as literal braces in format strings. You need another, third set, to indicate that len is a formatted expression.

Using format:

regex = re.compile("[{}]{{{}}}".format(chars, len))
regex = re.compile("[{chars}]{{{len}}}".format(chars= chars, len=len))
regex = re.compile("[{0}]{{{len}}}".format(chars, len=len))

Using both (for completeness):

regex = re.compile(f"[{{}}]{{{{{len}}}}}".format(chars))

In no case do you need + inside your character class. In square brackets, + is matched against literal plus character. It does not act as some magical quantifier. Also, repeating characters in a character class is pointlessly redundant.

Since your string does not have any backslashes in it, it doesn't need to be a raw string, and doesn't need the r prefix.

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
1

You can use an f-string by adding an f before the quotes of the string literal so that you can use one pair of curly brackets around len to evaluate its value as part of the string, and use a . (rather than a ,) to invoke the format method of the string. But since the f-string is evaluated first before being passed to str.format for formatting, in order for the empty curly brackets {} to be preserved literally by the f-string parser you would have to use double curly brackets to escape them. But then since you need curly brackets around the value of len in order for it to be a quantifier in your regex, you need to escape them once again by doubling them for str.format to preserve the curly brackets:

regex = re.compile(fr"[+{{}}+]{{{{{len}}}}}".format(chars))

Since curly brackets have special meanings in all of f-strings, str.format and regex, I would suggest that you format your string with a string formatting operator % instead so you don't have to deal with the escape hell above:

regex = re.compile(r'[+%s+]{%d}' % (chars, len))
blhsing
  • 91,368
  • 6
  • 71
  • 106