0

I was studying Python and ran into the following statement for pandas:

humans['Education'] = (humans[
    'Years of post-secondary education (e.g. BA=4; Ph.D.=10)'] .str.replace(r'.*=','').astype(int))

I understand it creates additional column but not sure about replace(r'.*=','') function.

Jae-Eun Park
  • 101
  • 1
  • 1
  • r'.*=' is a regular expression (RegEx), the . matches any character, while the star says match one or more of the preceding token (the dot). The final token = matches an '=' character. Anything matching this pattern will be replaced with an empty string – Oliver Scott Aug 21 '19 at 14:51

1 Answers1

0

This is a regular expression (see https://en.wikipedia.org/wiki/Regular_expression), so the function replaces what matches the expression with nothing in you case (see the '' given as second parameter).

To see what the regular expression is doing, you can use https://regex101.com/. In your case, it removes all text before the = sign (including the equal). Note that the r in front of the quoted string is explained here: What does the "r" in pythons re.compile(r' pattern flags') mean?

Chelmy88
  • 1,106
  • 1
  • 6
  • 17