Counting with condition in grouped dataframe

Question

I have a dataframe df with a list of datetime values and colours occurring each day that looks like this:

Date                    Info1
2020-02-03 12:03:22     RED
2020-02-03 16:11:21     RED
2020-02-03 09:00:00     GRN
2020-02-04 01:00:23     GRN
2020-02-05 06:12:55     RED
2020-02-05 20:14:00     GRN
2020-02-05 12:00:00     YLW
2020-02-05 22:01:12     RED
2020-02-05 11:11:09     GRN
2020-02-05 23:11:45     BLU
2020-02-06 12:00:00     BLU
2020-02-06 08:27:11     GRN

I am trying to create a DataFrame counting how many of each colour I have per day (regardless of the time), so the result should look like this: (If a day is missing I don't need to add it)

Date        RED GRN YLW BLU
2020-02-03  2   1   0   0
2020-02-04  0   1   0   0
2020-02-05  1   2   1   1
2020-02-06  0   1   0   0

I have listed all the existing colours in Info1 with

colours = df.Info1.unique()

I was thinking to cycle each element of this list and use it as a condition with a count(), but I can't figure out how.

So far I managed to count every non-null value of each day with

result=df.groupby(pd.Grouper(key='Date',freq='D'))['Info1'].count()

but I can't figure out how to specify a condition to count only a specific colour. Could someone point me in the right direction?

if you dont have a daily date in there then do you want that added in ? if not it seems like a simple pivot or crosstab `pd.crosstab(df["Date"], df["Info1"], df["Info1"], aggfunc="count").fillna(0)` — Umar.H, May 20 '20 at 23:10
@Datanovice sorry I should have specified that the Date column is actually made of datetime values including the time, I'll modify my request immediately! — albiff, May 20 '20 at 23:29
Does this answer your question? [How to pivot a dataframe](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) — Umar.H, May 21 '20 at 00:01

score 2 · Accepted Answer · answered May 20 '20 at 23:57

as rightly pointed out by @datanovice, crosstab can solve this for you

#read in data
df = pd.read_clipboard(sep='\s{2,}', parse_dates=['Date'])

pd.crosstab(df.Date.dt.date, df.Info1)

Info1      BLU  GRN RED YLW
Date                
2020-02-03  0   1   2   0
2020-02-04  0   1   0   0
2020-02-05  1   2   2   1
2020-02-06  1   1   0   0

score 0 · Answer 2 · answered May 20 '20 at 23:50

0

Look into the size method. Here's something that works:

(
    df.groupby(['Date', 'Info1'])
    .size()
    .reset_index()
    .rename(columns={0:'count'})
    .pivot(index='Date', columns='Info1', values='count')
    .reset_index()
    .fillna(0)
)

answered May 20 '20 at 23:50

yoskovia

350
2
10

thanks @yoskovia. The problem with this is that if I have two values in Date with same day but different time, they will be counted separately. – albiff May 21 '20 at 00:02

Counting with condition in grouped dataframe

2 Answers2