1

I have a df with the following structure:

my_df

date        hour  product
2019-06-06  17    laptopt
2019-06-06  15    printer
2019-06-07  14    laptopt
2019-06-07  17    desktop

How can I get a df like this:

hour  laptop  printer  desktop
14    1       0        0
15    0       1        0
16    0       0        0 
17    1       0        1

So far I've been trying doing my_df.groupby(["product","hour"]).count().unstack(level=0)

        date
product desktop laptop  printer
hour            
14      NaN     1.0     NaN
15      NaN     NaN     1.0
17      1.0     1.0     NaN

and I'm stucked there.

Thanks.

Contrapunto
  • 172
  • 1
  • 1
  • 10

3 Answers3

2

Call what you already have unstacked and do this:

index = pd.RangeIndex(df.hour.min(),df.hour.max() + 1)
unstacked.reindex(index).fillna(0).astype(int)
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
2

You can use pd.crosstab and reindex:

(pd.crosstab(df['hour'], df['product'])
   .reindex(pd.RangeIndex(df['hour'].min(), df['hour'].max()+1), fill_value=0))

product  desktop  laptopt  printer
14             0        1        0
15             0        0        1
16             0        0        0
17             1        1        0
cs95
  • 379,657
  • 97
  • 704
  • 746
2

IIUC

df.set_index('hour')['product'].str.get_dummies().sum(level=0).reindex(range(df.hour.min(),df.hour.max()+1),fill_value=0)
Out[15]: 
      desktop  laptopt  printer
hour                           
14          0        1        0
15          0        0        1
16          0        0        0
17          1        1        0
BENY
  • 317,841
  • 20
  • 164
  • 234