0

I'm doing some data analysis in which I have the following simplified Pandas DataFrame

df = pd.DataFrame({"Size_US": [8,8,9,9,9,10], 
                   "Year": [2015,2015,2015,2016,2016,2016], 
                   "Month": [1,1,4,2,2,6]})

Size_US - type of a product (shoe size)

Year - year in which a sale of the product took place

Month - month in which a sale of the product took place

I need to count the number of sales per month by Size_US and by Year. Ideally, I want to construct a new data frame in which I would have as row indexes the unique values of Size_US and as columns the months of each year (2015 and 2016, so I would have 24 columns). In each row x column entry, I would have the number of sales of each Size_US in that month.

So in the example I gave, the entry corresponding to row label "8" and month "1-2015" (Jan-2015) would have the number 2, because there were two sales of size 8 in Month 1. In month "2-2015" I would have 0 as an entry. In month 4-2015, I would have 1 as entry and so on. In month 2-2016, I would have 2 as an entry...

I spent many hours trying to build such a data frame, but I failed miserably. I tried to use groupby(), tried to build auxiliary dictionaries, lists, data frames but I could not make it.

  • [`df.groupby(df.columns.tolist()).size().unstack(level=[1, 2], fill_value=0)`](https://akuiper.com/console/fkRlGlN-DRit) – Psidom Jan 07 '22 at 18:42
  • @G.Anderson I'm going to read through – brazilian_student Jan 07 '22 at 18:49
  • @Psidom it did work!!, but for some reason the columns got a bit mixed up regarding the years 2015 and 2016. I did not get a neat grouping from 1 to 12 for each year. I got 5:10 (2015) then 1:11 (2016) then 1:4 (2015) etc. Do you know why this happened? And if you please explain the syntax of what you did. I'm not sure if I understood all the steps. – brazilian_student Jan 07 '22 at 18:52

0 Answers0