0

I am trying to figure out how to incorporate regex into a python if statement. I have a pandas dataframe where I am iterating over the rows and want to perform an action every time the row has a specific combination of text. The regex should match any 7 character string that begins with a capital letter followed by 6 numbers (ie. R142389)

        for index, row in df1.iterrows():
             if row[4] == REGEX HERE:
                  Perform Action

Am I going about this the right way? Any help would be greatly appreciated!

johankent30
  • 65
  • 2
  • 3
  • 11

2 Answers2

2

Yes, you can do this, just use match, which will only match at the beginning of the string it is being compared to. You would have to use search to search the entire string.

A bit of explanation about the regex:

^ asserts position at start of the string

[A-Z] A-Z a single character in the range between A (index 65) and Z (index 90) (case sensitive)

\d{6} matches a digit (equal to [0-9]) {6} Quantifier — Matches exactly 6 times

$ asserts position at the end of the string, or before the line terminator right at the end of the string

import re

regex = re.compile('^[A-Z]\d{6}$')

possibles = ['R142389', 'hello', 'J123456']

for line in possibles:
    if regex.match(line):
        print(line)

Output:

R142389
J123456
user3483203
  • 50,081
  • 9
  • 65
  • 94
  • 2
    Be careful with `str = ['R142389', 'hello', 'J123456']`. Your overwriting a built-in name which can cause [strange errors](https://stackoverflow.com/questions/45740182/im-getting-typeerror-list-object-is-not-callable-how-do-i-fix-this-error) and the like. It'd would be best to use a name such as `strs` instead. Anyway's, nice answer. +1 – Christian Dean Jan 11 '18 at 21:21
  • If you recommend using `match`, you should mention that it will only match patterns at the beginning of the string. `search` will find matches in the whole string. – skrrgwasme Jan 11 '18 at 21:21
  • @ChristianDean whoops, fixed – user3483203 Jan 11 '18 at 21:24
  • @skrrgwasme Perhaps, but note that the OP _explictly_ said _"The regex should match any 7 character string that **begins** with a capital letter followed by 6 number"_. – Christian Dean Jan 11 '18 at 21:27
0

I would use the re module

import re

re.search(pattern, string, flags=0)

where pattern is the regular expression to be matched, string is the string going to be searched and flags which are optional modifiers. This funcion returns None when there is no match.

Here is the re documentation: https://docs.python.org/2/library/re.html

And here is an example of the implementation: https://www.tutorialspoint.com/python/python_reg_expressions.htm

Marcelo Villa-Piñeros
  • 1,101
  • 2
  • 12
  • 24