0

I have a dataframe df with a list of datetime values and colours occurring each day that looks like this:

Date                    Info1
2020-02-03 12:03:22     RED
2020-02-03 16:11:21     RED
2020-02-03 09:00:00     GRN
2020-02-04 01:00:23     GRN
2020-02-05 06:12:55     RED
2020-02-05 20:14:00     GRN
2020-02-05 12:00:00     YLW
2020-02-05 22:01:12     RED
2020-02-05 11:11:09     GRN
2020-02-05 23:11:45     BLU
2020-02-06 12:00:00     BLU
2020-02-06 08:27:11     GRN

I am trying to create a DataFrame counting how many of each colour I have per day (regardless of the time), so the result should look like this: (If a day is missing I don't need to add it)

Date        RED GRN YLW BLU
2020-02-03  2   1   0   0
2020-02-04  0   1   0   0
2020-02-05  1   2   1   1
2020-02-06  0   1   0   0

I have listed all the existing colours in Info1 with

colours = df.Info1.unique()

I was thinking to cycle each element of this list and use it as a condition with a count(), but I can't figure out how.

So far I managed to count every non-null value of each day with

result=df.groupby(pd.Grouper(key='Date',freq='D'))['Info1'].count()

but I can't figure out how to specify a condition to count only a specific colour. Could someone point me in the right direction?

albiff
  • 3
  • 2
  • 1
    if you dont have a daily date in there then do you want that added in ? if not it seems like a simple pivot or crosstab `pd.crosstab(df["Date"], df["Info1"], df["Info1"], aggfunc="count").fillna(0)` – Umar.H May 20 '20 at 23:10
  • 1
    @Datanovice sorry I should have specified that the Date column is actually made of datetime values including the time, I'll modify my request immediately! – albiff May 20 '20 at 23:29
  • Does this answer your question? [How to pivot a dataframe](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe) – Umar.H May 21 '20 at 00:01

2 Answers2

2

as rightly pointed out by @datanovice, crosstab can solve this for you

#read in data
df = pd.read_clipboard(sep='\s{2,}', parse_dates=['Date'])

pd.crosstab(df.Date.dt.date, df.Info1)

Info1      BLU  GRN RED YLW
Date                
2020-02-03  0   1   2   0
2020-02-04  0   1   0   0
2020-02-05  1   2   2   1
2020-02-06  1   1   0   0
sammywemmy
  • 27,093
  • 4
  • 17
  • 31
0

Look into the size method. Here's something that works:

(
    df.groupby(['Date', 'Info1'])
    .size()
    .reset_index()
    .rename(columns={0:'count'})
    .pivot(index='Date', columns='Info1', values='count')
    .reset_index()
    .fillna(0)
)
yoskovia
  • 350
  • 2
  • 10
  • thanks @yoskovia. The problem with this is that if I have two values in Date with same day but different time, they will be counted separately. – albiff May 21 '20 at 00:02