1

I'm trying to count the number of ships in a column of a dataframe. In this case I'm trying to count the number of 77Hs. I can do it for individual elements but actions on the whole column don't seem to work

E.g. This works with an individual element in my dataframe

df = pd.DataFrame({'Route':['Callais','Dover','Portsmouth'],'shipCode':[['77H','77G'],['77G'],['77H','77H']]})
df['shipCode'][2].count('77H')

But when I try and perform the action on every row using either

df['shipCode'].count('77H')
df['shipCode'].str.count('77H')

It fails with both attempts, any help on how to code this would be much appreciated

Thanks

teajay
  • 13
  • 5
  • 1
    Filter your dataframe with those values and do value_counts on the resulting dataframe – Teja Jun 18 '20 at 17:08
  • 1
    `df` in this example is `dict` not a `dataframe` – Trenton McKinney Jun 18 '20 at 17:11
  • Does this answer your question? [How to filter a dictionary according to an arbitrary condition function?](https://stackoverflow.com/questions/2844516/how-to-filter-a-dictionary-according-to-an-arbitrary-condition-function) or [Search for a value anywhere in a pandas DataFrame](https://stackoverflow.com/questions/53979403/search-for-a-value-anywhere-in-a-pandas-dataframe) if you actually have a pandas dataframe; in this case, you do not. – Trenton McKinney Jun 18 '20 at 17:12

2 Answers2

0

what if you did something like this??

assuming your initial dictionary...

import pandas as pd
from collections import Counter
df = pd.DataFrame(df) #where df is the dictionary defined in OP

you can generate a Counter for all of the elements in the lists in each row like this:

df['counts'] = df['shipCode'].apply(lambda x: Counter(x))

output:

        Route    shipCode                counts
0     Callais  [77H, 77G]  {'77H': 1, '77G': 1}
1       Dover       [77G]            {'77G': 1}
2  Portsmouth  [77H, 77H]            {'77H': 2}

or if you want one in particular, i.e. '77H', you can do something like this:

 df['counts'] = df['shipCode'].apply(lambda x: Counter(x)['77H'])

output:

        Route    shipCode  counts
0     Callais  [77H, 77G]       1
1       Dover       [77G]       0
2  Portsmouth  [77H, 77H]       2

or even this using the first method (full Counter in each row):

[count['77H'] for count in df['counts']]

output:

[1, 0, 2]
Derek Eden
  • 4,403
  • 3
  • 18
  • 31
0

The data frame has a shipcode column with a list of values.

First show a True or False value to identify rows that contain the string '77H' in the shipcode column.

> df['shipcode'].map(lambda val: val.count('77H')>0)

Now filter the data frame based on those True/False values obtained in the previous step.

> df[df['shipcode'].map(lambda val: val.count('77H')>0)]

Finally, get a count for all values in the data frame where the shipcode list contains a value matching '77H' using the python len method.

> len(df[df['shipcode'].map(lambda val: val.count('77H')>0)])

Another way that makes it easy to remember what's been analyzed is to create a column in the same data frame to store the True/False value. Then filter by the True/False values. It's really the same as above but a little prettier in my opinion.

> df['filter_column'] = df['shipcode'].map(lambda val: val.count('77H')>0)
> len(df[df['filter_column']])

Good luck and enjoy working with Python and Pandas to process your data!

Gui LeFlea
  • 795
  • 3
  • 12
  • Oh, goodness! I didn't notice that you asked for a total count of occurrences. I thought you were looking for a count of rows containing the 77H value in the shipcode column. – Gui LeFlea Jun 20 '20 at 05:21