I have a dataset of weekly sales of a few stores, which looks something similar to :
store_id | item_id | week | sales |
---|---|---|---|
store1 | item1 | 2021-01 | 3 |
store1 | item2 | 2021-01 | 2 |
store2 | item1 | 2021-01 | 10 |
store2 | item3 | 2021-01 | 1 |
store1 | item1 | 2021-02 | 5 |
store1 | item2 | 2021-02 | 1 |
store2 | item1 | 2021-02 | 11 |
store1 | item3 | 2021-03 | 6 |
store1 | item1 | 2021-04 | 7 |
store2 | item3 | 2021-04 | 2 |
I am trying to use pandas dataframe and transform this into a single entry per every (store, item) combination. I want to create separate columns of sales, for every unique entry in the week column.
Specifically for the example above, it should look like this after the transformation :
store_id | item_id | sales_week_2021-01 | sales_week_2021-02 | sales_week_2021-03 | sales_week_2021-04 |
---|---|---|---|---|---|
store1 | item1 | 3 | 5 | 0 | 7 |
store1 | item2 | 2 | 1 | 0 | 0 |
store1 | item3 | 0 | 0 | 6 | 0 |
store2 | item1 | 10 | 11 | 0 | 0 |
store2 | item3 | 1 | 0 | 0 | 2 |
[Note that multiple stores can share the same items].
I'm able to achieve this transformation using plain python dictionaries,etc but I need a pandas way to do it. The reason being these are huge datasets and I later might also want to transform this using py-spark to get the advantage of fast processing.
I'm pretty new to pandas, so please excuse if the question sounds naive to the community :)