How can I count occurrences of a string in a dataframe in Python?

Question

I'm trying to count the number of ships in a column of a dataframe. In this case I'm trying to count the number of 77Hs. I can do it for individual elements but actions on the whole column don't seem to work

E.g. This works with an individual element in my dataframe

df = pd.DataFrame({'Route':['Callais','Dover','Portsmouth'],'shipCode':[['77H','77G'],['77G'],['77H','77H']]})
df['shipCode'][2].count('77H')

But when I try and perform the action on every row using either

df['shipCode'].count('77H')
df['shipCode'].str.count('77H')

It fails with both attempts, any help on how to code this would be much appreciated

Thanks

Filter your dataframe with those values and do value_counts on the resulting dataframe — Teja, Jun 18 '20 at 17:08
Does this answer your question? [How to filter a dictionary according to an arbitrary condition function?](https://stackoverflow.com/questions/2844516/how-to-filter-a-dictionary-according-to-an-arbitrary-condition-function) or [Search for a value anywhere in a pandas DataFrame](https://stackoverflow.com/questions/53979403/search-for-a-value-anywhere-in-a-pandas-dataframe) if you actually have a pandas dataframe; in this case, you do not. — Trenton McKinney, Jun 18 '20 at 17:12

Derek Eden · Accepted Answer · 2020-06-18T17:21:20.633

what if you did something like this??

assuming your initial dictionary...

import pandas as pd
from collections import Counter
df = pd.DataFrame(df) #where df is the dictionary defined in OP

you can generate a Counter for all of the elements in the lists in each row like this:

df['counts'] = df['shipCode'].apply(lambda x: Counter(x))

output:

        Route    shipCode                counts
0     Callais  [77H, 77G]  {'77H': 1, '77G': 1}
1       Dover       [77G]            {'77G': 1}
2  Portsmouth  [77H, 77H]            {'77H': 2}

or if you want one in particular, i.e. '77H', you can do something like this:

 df['counts'] = df['shipCode'].apply(lambda x: Counter(x)['77H'])

output:

        Route    shipCode  counts
0     Callais  [77H, 77G]       1
1       Dover       [77G]       0
2  Portsmouth  [77H, 77H]       2

or even this using the first method (full Counter in each row):

[count['77H'] for count in df['counts']]

output:

[1, 0, 2]

Thanks a lot! Exactly what I was looking for – teajay Jun 18 '20 at 17:20 — teajay, Jun 18 '20 at 17:20

score 0 · Answer 2 · answered Jun 18 '20 at 17:28

The data frame has a shipcode column with a list of values.

First show a True or False value to identify rows that contain the string '77H' in the shipcode column.

> df['shipcode'].map(lambda val: val.count('77H')>0)

Now filter the data frame based on those True/False values obtained in the previous step.

> df[df['shipcode'].map(lambda val: val.count('77H')>0)]

Finally, get a count for all values in the data frame where the shipcode list contains a value matching '77H' using the python len method.

> len(df[df['shipcode'].map(lambda val: val.count('77H')>0)])

Another way that makes it easy to remember what's been analyzed is to create a column in the same data frame to store the True/False value. Then filter by the True/False values. It's really the same as above but a little prettier in my opinion.

> df['filter_column'] = df['shipcode'].map(lambda val: val.count('77H')>0)
> len(df[df['filter_column']])

Good luck and enjoy working with Python and Pandas to process your data!

Oh, goodness! I didn't notice that you asked for a total count of occurrences. I thought you were looking for a count of rows containing the 77H value in the shipcode column. — Gui LeFlea, Jun 20 '20 at 05:21

How can I count occurrences of a string in a dataframe in Python?

2 Answers2