Regex count and print from a column

Question

I am trying to count matching regex in a column and print out the amount found, the code below keeps giving me 0. I have a feeling it's not iterating through the whole column? My code is as below.

import re

pattern = ('/^[A-Z]{1}\d{8}$/i')
numbers = jan_df['Student Number']

iterator = re.finditer(pattern, str(numbers))
count = 0

for match in iterator:
    count+=1
print(count)

Does `pattern = r'^[A-Za-z]\d{8}$'` work? Do you mean `df = pd.DataFrame({'Student Number':['A12345678', 'abc', 'a12345678']})` should yield `2`? — Wiktor Stribiżew, May 30 '22 at 07:54

Wiktor Stribiżew · Answer 1 · 2022-05-31T07:36:17.283

You can use

df.loc[df['Student Number'].str.contains(r'^[A-Za-z]\d{8}$'), :].shape[0]

Or, if you plan to use a more specific regex and need to make it case insensitive:

df.loc[df['Student Number'].str.contains(r'^[A-Z]\d{8}$', case=False), :].shape[0]

# or

df.loc[df['Student Number'].str.contains(r'(?i)^[A-Z]\d{8}$'), :].shape[0]

Notes:

The regex in Python is defined with string literals, not regex literals, so you cannot use /.../i thing, you need ... with flags as options, or as inline flags ((?i)...)
{1} is always redundant in regex patterns, please remove it
Series.str.contains returns True or False depending if there is a match. df.loc[df[col].str.contains(...), :] only returns those rows where the match was found
Dataframe.shape returns the dimensions of the data frame, so .shape[0] returns the number of rows.

Related SO posts

Regex count and print from a column

1 Answers1