4

Question

I have an email_alias column and I'd like to find the number of integers in that column (per row) in another column using Python. So far I can only count the total number of numbers in the entire column.

Attempt

I tried: df['count_numbers'] = sum(c.isdigit() for c in df['email_alias'])

Example:

email_alias       count_numbers
thisisatest111      3
testnumber2         1
roganjosh
  • 12,594
  • 4
  • 29
  • 46
Max Bade
  • 53
  • 1
  • 8

3 Answers3

8

I believe this might be the simplest solution.

df['count_numbers'] = df['email_alias'].str.count('\d')
oil_lamp
  • 482
  • 7
  • 9
5

You can apply a custom python function to the column. I don't think there's a vectorized way. sum() here takes advantage of the fact that bools are a subclass of ints so all True values are equal to 1.

import pandas as pd

def count_digits(string):
    return sum(item.isdigit() for item in string)

df = pd.DataFrame({'a': ['thisisatest111', 'testnumber2']})
df['counts'] = df['a'].apply(count_digits)

Your approach of:

df['count_numbers'] = sum(c.isdigit() for c in df['email_alias']) 

could not work because df['count_numbers'] = is an assignment to every value in that column. Here, apply implicitly iterates over the rows (but in Python time, so it's not vectorized). Then again, most of the .str accessor methods of Pandas are, too, despite the syntax suggesting it will go faster than a for loop.

roganjosh
  • 12,594
  • 4
  • 29
  • 46
0

You can modify you code and get the same result in one line (idea from roganjosh's answer):

df["count_numbers"] = df["email_alias"].apply(lambda x: sum(c.isdigit() for c in x))
Sam S.
  • 627
  • 1
  • 7
  • 23