2

I am trying to count matching regex in a column and print out the amount found, the code below keeps giving me 0. I have a feeling it's not iterating through the whole column? My code is as below.

import re

pattern = ('/^[A-Z]{1}\d{8}$/i')
numbers = jan_df['Student Number']

iterator = re.finditer(pattern, str(numbers))
count = 0

for match in iterator:
    count+=1
print(count)
Redox
  • 9,321
  • 5
  • 9
  • 26
  • Does `pattern = r'^[A-Za-z]\d{8}$'` work? Do you mean `df = pd.DataFrame({'Student Number':['A12345678', 'abc', 'a12345678']})` should yield `2`? – Wiktor Stribiżew May 30 '22 at 07:54

1 Answers1

1

You can use

df.loc[df['Student Number'].str.contains(r'^[A-Za-z]\d{8}$'), :].shape[0]

Or, if you plan to use a more specific regex and need to make it case insensitive:

df.loc[df['Student Number'].str.contains(r'^[A-Z]\d{8}$', case=False), :].shape[0]

# or

df.loc[df['Student Number'].str.contains(r'(?i)^[A-Z]\d{8}$'), :].shape[0]

Notes:

  • The regex in Python is defined with string literals, not regex literals, so you cannot use /.../i thing, you need ... with flags as options, or as inline flags ((?i)...)
  • {1} is always redundant in regex patterns, please remove it
  • Series.str.contains returns True or False depending if there is a match. df.loc[df[col].str.contains(...), :] only returns those rows where the match was found
  • Dataframe.shape returns the dimensions of the data frame, so .shape[0] returns the number of rows.

Related SO posts

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563