sum occurrences of a string in pandas dataframe

Question

I have to count and sum totals over a dataframe, but with a condition:

fruit days_old
apple 4
apple 5
orange 1
orange 5

I have to count with the condition that a fruit is over 3 days old. So the output I need is

2 apples and 1 orange

I thought I would have to use an apply function, but I would have to save each fruit type to a variable or something. I'm sure there's an easier way.

ps. I've been looking but I don't see a clear way to create tables here with proper spacing. The only thing that's clear is to not copy and paste with tabs!

Almost a dupe: [what is the most efficient way of counting occurrences in pandas?](https://stackoverflow.com/questions/20076195/what-is-the-most-efficient-way-of-counting-occurrences-in-pandas) (missing the filter part). — pault, Apr 12 '18 at 17:40

score 3 · Accepted Answer · answered Apr 12 '18 at 17:33

3

One way is to use pd.Series.value_counts:

res = df.loc[df['days_old'] > 3, 'fruit'].value_counts()

# apple     2
# orange    1
# Name: fruit, dtype: int64

Using pd.DataFrame.apply is inadvisable as this will result in an inefficient loop.

answered Apr 12 '18 at 17:33

jpp

159,742
34
281
339

1

All great answers, thanks. I knew how to count but didn't know where to put the condition. Cheers! – Chuck Apr 12 '18 at 17:50

score 3 · Answer 2 · answered Apr 12 '18 at 17:34

3

You can use value_counts():

In [120]: df[df.days_old > 3]['fruit'].value_counts()
Out[120]:
apple     2
orange    1
Name: fruit, dtype: int64

answered Apr 12 '18 at 17:34

user3483203

50,081
9
65
94

score 3 · Answer 3 · answered Apr 12 '18 at 17:54

3

I wanted in the variation party.

pd.factorize + np.bincount

f, u = pd.factorize(df.fruit)
pd.Series(
    np.bincount(f, df.days_old > 3).astype(int), u
)

apple     2
orange    1
dtype: int64

answered Apr 12 '18 at 17:54

piRSquared

285,575
57
475
624

score 2 · Answer 4 · answered Apr 12 '18 at 17:45

2

The value_counts() methods described by @jpp and @chrisz are great. Just to post another strategy, you can use groupby:

df[df.days_old > 3].groupby('fruit').size()

# fruit
# apple     2
# orange    1
# dtype: int64

answered Apr 12 '18 at 17:45

sacuL

49,704
8
81
106

sum occurrences of a string in pandas dataframe

4 Answers4