0

I tried this code:

x = re.search("f?e?males?\b", "russian male")

if (x):
    print("YES! We have a match!")
else:
    print("No match")

BUT it is printing "No match".

Im testing to apply it to a data frame. If the string has "male" in it, it has to return another value.

But, regex is not working. Do you know why? I dont want to put only "male" because I want to also match female, females, males, etc.

Cateban
  • 47
  • 1
  • 7
  • 1
    Maybe try a raw string for the regex: `x = re.search(r"f?e?males?\b", "russian male")` or escape the `\b` with `\\b`. – Mark Apr 04 '20 at 23:01

3 Answers3

1

Use the r prefix when writing the patterns. i.e r'f?e?males\b'

Raw strings interact weirdly. More information can be found in the top answer here -> Python regex - r prefix

0

Add an 'r' in front of the regex:x = re.search(r"f?e?males?\b", "russian male"), because your regex has an '\' in the string. See Regular expression operations:

Regular expressions use the backslash character ('\') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write '\\' as the pattern string, because the regular expression must be \, and each backslash must be expressed as \ inside a regular Python string literal. Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a DeprecationWarning and in the future this will become a SyntaxError. This behaviour will happen even if it is a valid escape sequence for a regular expression.

The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with 'r'. So r"\n" is a two-character string containing '\' and 'n', while "\n" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation.

AndrewQ
  • 409
  • 1
  • 13
  • 22
0

The problem seems to be the \b-Part of your regex. I think you want a lookahead here: x = re.search(r"f?e?males?(?!\S)", "russian male") This matches "russian male", "russian male ", "russian males" but not "russian maley" or "russian male!"

Oh, and as the other 2 answers pointed out: you need the r in front of your regex :)

Lydia van Dyke
  • 2,466
  • 3
  • 13
  • 25