3

I have to count and sum totals over a dataframe, but with a condition:

fruit days_old
apple 4
apple 5
orange 1
orange 5

I have to count with the condition that a fruit is over 3 days old. So the output I need is

2 apples and 1 orange

I thought I would have to use an apply function, but I would have to save each fruit type to a variable or something. I'm sure there's an easier way.

ps. I've been looking but I don't see a clear way to create tables here with proper spacing. The only thing that's clear is to not copy and paste with tabs!

Chuck
  • 1,061
  • 1
  • 20
  • 45
  • Almost a dupe: [what is the most efficient way of counting occurrences in pandas?](https://stackoverflow.com/questions/20076195/what-is-the-most-efficient-way-of-counting-occurrences-in-pandas) (missing the filter part). – pault Apr 12 '18 at 17:40

4 Answers4

3

One way is to use pd.Series.value_counts:

res = df.loc[df['days_old'] > 3, 'fruit'].value_counts()

# apple     2
# orange    1
# Name: fruit, dtype: int64

Using pd.DataFrame.apply is inadvisable as this will result in an inefficient loop.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    All great answers, thanks. I knew how to count but didn't know where to put the condition. Cheers! – Chuck Apr 12 '18 at 17:50
3

You can use value_counts():

In [120]: df[df.days_old > 3]['fruit'].value_counts()
Out[120]:
apple     2
orange    1
Name: fruit, dtype: int64
user3483203
  • 50,081
  • 9
  • 65
  • 94
3

I wanted in the variation party.

pd.factorize + np.bincount

f, u = pd.factorize(df.fruit)
pd.Series(
    np.bincount(f, df.days_old > 3).astype(int), u
)

apple     2
orange    1
dtype: int64
piRSquared
  • 285,575
  • 57
  • 475
  • 624
2

The value_counts() methods described by @jpp and @chrisz are great. Just to post another strategy, you can use groupby:

df[df.days_old > 3].groupby('fruit').size()

# fruit
# apple     2
# orange    1
# dtype: int64
sacuL
  • 49,704
  • 8
  • 81
  • 106