0

I'm about to learn python. I hope this is not a stupid question but I could really need some help with this:

I have the following DataFrame "df_sales_2016_us" with 2935 entries. This is the head.

        Gender  Country     Size (US)   year    Month
7610    Female  United States   8.0     2016    1
7613    Female  United States   9.0     2016    1
7617    Male    United States   9.5     2016    1
7618    Female  United States   10.5    2016    1
7619    Male    United States   8.5     2016    1

"Month" contains values from 1 to 12 for each month. "Size (US") contains 16 different shoe size from 6.0 to 15

I would now like to create a new DataFrame looking like this:

  • The columns are the Month from 1 to 12 while the rows are the different shoe sizes from 6.0 to 15.
  • The single values should be the number of shoes for each size sold in each month.

How can I achieve this? This a table created with the value "0" just to clearify my goal.

        1   2   3   4   5   6   7   8   9   10  11  12
6.0     0   0   0   0   0   0   0   0   0   0   0   0
6.5     0   0   0   0   0   0   0   0   0   0   0   0
7.0     0   0   0   0   0   0   0   0   0   0   0   0
7.5     0   0   0   0   0   0   0   0   0   0   0   0
8.0     0   0   0   0   0   0   0   0   0   0   0   0
8.5     0   0   0   0   0   0   0   0   0   0   0   0
9.0     0   0   0   0   0   0   0   0   0   0   0   0
9.5     0   0   0   0   0   0   0   0   0   0   0   0
10.0    0   0   0   0   0   0   0   0   0   0   0   0
10.5    0   0   0   0   0   0   0   0   0   0   0   0
11.0    0   0   0   0   0   0   0   0   0   0   0   0
11.5    0   0   0   0   0   0   0   0   0   0   0   0
12.0    0   0   0   0   0   0   0   0   0   0   0   0
13.0    0   0   0   0   0   0   0   0   0   0   0   0
14.0    0   0   0   0   0   0   0   0   0   0   0   0
15.0    0   0   0   0   0   0   0   0   0   0   0   0

I tried to create the following DataFrame df_test, but I am out of options to proceed and think this is a totally wrong approach. (Here as an example only for "Month" "1"). Is there any option to get only the specific value needed for each column/row?

df_test = pd.DataFrame({'1':[df_sales_2016_us["Month"]==1],
                        '2':[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                        '3':[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                        '4': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                        '5': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                        '6': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                        '7': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                        '8': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                        '9': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                        '10': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                        '11': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],
                        '12': [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
                       }, index=[df_sales_2016_us["Size (US)"].unique()])

df_test.sort_index()

The result for the way above is of course the following table as my approach does not get the single values needed but just informs, if "Month" is "1". But I have no idea how to go on from here or how to switch to another way to solve this. If someone would have an idea for this I would be very grateful. Thank you so much!

                                      1     2   3   4   5   6   7   8   9   10  11  12
6.0     7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
6.5     7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
7.0     7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
7.5     7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
8.0     7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
8.5     7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
9.0     7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
9.5     7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
10.0    7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
10.5    7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
11.0    7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
11.5    7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
12.0    7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
13.0    7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
14.0    7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
15.0    7617 True 7619 True 7629 True 7...  0   0   0   0   0   0   0   0   0   0   0
Timo
  • 33
  • 4

1 Answers1

4

You can do pd.crosstab:

pd.crosstab(df['Size (US)'], df['Month'])

If you want to count the sales by gender, you can do that too:

pd.crosstab([df['Gender'], df['Size (US)']], df['Month'])
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • YES! Crosstab is the deal! Thank you so much :) . I should have found that sooner and by myself. I think my searching terms have not been good enough. – Timo Dec 18 '20 at 14:36