Pandas create and populate new columns based on values of existing column

Question

I have a dataset of weekly sales of a few stores, which looks something similar to :

store_id	item_id	week	sales
store1	item1	2021-01	3
store1	item2	2021-01	2
store2	item1	2021-01	10
store2	item3	2021-01	1
store1	item1	2021-02	5
store1	item2	2021-02	1
store2	item1	2021-02	11
store1	item3	2021-03	6
store1	item1	2021-04	7
store2	item3	2021-04	2

I am trying to use pandas dataframe and transform this into a single entry per every (store, item) combination. I want to create separate columns of sales, for every unique entry in the week column.

Specifically for the example above, it should look like this after the transformation :

store_id	item_id	sales_week_2021-01	sales_week_2021-02	sales_week_2021-03	sales_week_2021-04
store1	item1	3	5	0	7
store1	item2	2	1	0	0
store1	item3	0	0	6	0
store2	item1	10	11	0	0
store2	item3	1	0	0	2

[Note that multiple stores can share the same items].
I'm able to achieve this transformation using plain python dictionaries,etc but I need a pandas way to do it. The reason being these are huge datasets and I later might also want to transform this using py-spark to get the advantage of fast processing.

I'm pretty new to pandas, so please excuse if the question sounds naive to the community :)

Could you perhaps share your code made in plain python without pandas to take a look at it for further helper? — Simon Provost, Nov 26 '21 at 11:23
Is this would works : (print it on your Jupiter for example): `res = df.groupby(['store_id', "item_id", "week"]).sum()` — Simon Provost, Nov 26 '21 at 11:32
Reference from pivot : `df.groupby(['store_id', 'item_id', 'week'])['sales'].mean().unstack(fill_value=0)` did the trick for me. Thanks! — Manish Tripathi, Nov 26 '21 at 17:37

Pandas create and populate new columns based on values of existing column

0 Answers0