How do I find consecutive repeating numbers in my pandas column?

Question

I have two columns, one contains a string of numbers and one contains a two or three digits, as below:

    Account number     
0   5493455646944        
1   56998884221          
2   95853255555926       
3   5055555555495718323  
4   56999998247361       
5   6506569568

I would like to create a regex function which displays a flag if the account number contains more 5 or more consecutive, repeated numbers.

So in theory, the target state is as follows:

    Account number     test
0   5493455646944        No
1   56998884221          No
2   95853255555926       Yes
3   5055555555495718323  Yes
4   56999998247361       Yes
5   6506569568           No

I was thinking something like:

def reg_finder(x):
    return re.findall('^([0-9])\1{5,}$', x)

I am not good with regex at all so unsure...thanks

Edit: this is what I tried:

def reg_finder(x):
    return re.findall('\b(\d)\1+\b', x)

example_df['test'] = example_df['Account number'].apply(reg_finder)

    Account number      test
0   5493455646944        []
1   56998884221          []
2   95853255555926       []
3   5055555555495718323  []
4   56999998247361       []
5   6506569568           []

Duplicate: take a look here https://stackoverflow.com/questions/6507982/regex-to-find-repeating-numbers — Damiaan, Apr 21 '22 at 13:49
Thanks but this does not work, I will update the post with the result. — work_python, Apr 21 '22 at 14:08
For some reason I get this...```TypeError: expected string or bytes-like object``` — work_python, Apr 21 '22 at 14:18
There is no point using `re.findall` since you only want `Yes` or `No` as a result. — Wiktor Stribiżew, Apr 21 '22 at 14:29

Ynjxsjmh · Answer 1 · 2022-04-22T02:03:41.623

Problems in your regex re.findall('^([0-9])\1{5,}$', x):

You use ^ and $ which is used to match the whole string is continuous.
You want to match contains more 5, the \1 is already a match, you only need 4 more.

You can use

df['test'] = np.where(df['Account number'].astype(str).str.contains(r'([0-9])\1{4,}'), 'Yes', 'No')

# Or

df['test'] = np.where(df['Account number'].astype(str).str.contains(r'(\d)\1{4,}'), 'Yes', 'No')

print(df)

        Account number test
0        5493455646944   No
1          56998884221   No
2       95853255555926  Yes
3  5055555555495718323  Yes
4       56999998247361  Yes
5           6506569568   No

score 0 · Accepted Answer · answered Apr 21 '22 at 14:26

You can use

import pandas as pd
import warnings
warnings.filterwarnings("ignore", message="This pattern has match groups")

df = pd.DataFrame({'Account number':["5493455646944","56998884221","95853255555926","5055555555495718323","56999998247361","6506569568"]})
df['test'] = "No"
df.loc[df["Account number"].str.contains(r'([0-9])\1{4,}'), 'test'] = "Yes"

Output:

>>> df
        Account number test
0        5493455646944   No
1          56998884221   No
2       95853255555926  Yes
3  5055555555495718323  Yes
4       56999998247361  Yes
5           6506569568   No

Note that r'([0-9])\1{4,}' regex is defined with a raw string literal where backslashes are parsed as literal backslashes, and not string escape sequence auxiliary chars.

@work_python Note that I suppress the warning thrown by `Series.str.contains` since the capturing group is used on purpose, so as to use a backreference in the same pattern later. — Wiktor Stribiżew, Apr 21 '22 at 14:31

score 0 · Answer 3 · answered Mar 08 '23 at 02:11

dd1=df1.assign(col1=df1['Account number'].astype(str).map(list)).explode("col1")
col2=dd1.col1.ne(dd1.col1.shift()).cumsum()
dd2=dd1.assign(test=col2).assign(col3=lambda dd:dd.groupby(['Account number',col2]).test.transform('size'))
dd2.groupby("Account number",sort=False,as_index=False).apply(lambda dd:"yes" if dd.col3.ge(5).any() else "no")

out：

        Account number test
0        5493455646944   No
1          56998884221   No
2       95853255555926  Yes
3  5055555555495718323  Yes
4       56999998247361  Yes
5           6506569568   No

How do I find consecutive repeating numbers in my pandas column?

3 Answers3