pandas: count string criteria across down rows

Question

I have a dataframe that looks like:

  col_1
0  A
1  A:C:D
2  A:B:C:D:E
3  B:D

I'm trying to count each ':' to get to:

  col_1        count
0  A             0
1  A:C:D         2
2  A:B:C:D:E     4
3  B:D           1

I've tried applying a function to no avail:

def count_fx(df):
    return df.str.contains(':').sum()
df['count'] = df['col_1'].apply(count_fx)

Also,

df['count'] = df['col_1'].apply(lambda x: (x.str.contains(':')).sum(), axis=1)

Russia Must Remove Putin · Answer 1 · 2014-06-16T15:03:00.543

3

You need to use the str.count method:

df['count'] = df[0].apply(lambda x: x.count(':'))

           0  count
0      A:C:D      2
1  A:B:C:D:E      4
2        B:D      1

edited Jun 16 '14 at 15:03

answered Jun 16 '14 at 14:56

Russia Must Remove Putin

374,368
89
403
331

Why not use the built in string method, rather than apply. `df[0].str.count(':')` – chrisb Jun 16 '14 at 15:09
I get an assertion error for that: `AssertionError: Level : must be same as name (None)` – Russia Must Remove Putin Jun 16 '14 at 15:12
1

@chrisb using that function is actually slower than using `apply` (see my answer). – Ffisegydd Jun 16 '14 at 15:15

score 2 · Accepted Answer · edited Jun 17 '14 at 03:33

2

The way you use apply() here causes each element to be passed in. I.e., the DataFrame is not what is passed in. So you just need to change your function to

def count_fx(s):
    return s.count(':')

Then you can use what you had:

In [11]: df['count'] = df['col_1'].apply(count_fx)

In [12]: df
Out[12]:
       col_1  count
0          A      0
1      A:C:D      2
2  A:B:C:D:E      4
3        B:D      1

edited Jun 17 '14 at 03:33

Russia Must Remove Putin

374,368
89
403
331

answered Jun 16 '14 at 15:01

chrisaycock

36,470
14
88
125

pandas: count string criteria across down rows

2 Answers2

Linked