I have a DataFrame that when simplified looks like the following.
import pandas as pd
report_date_rng = pd.DataFrame(pd.date_range(start='2021-08-01', end='2021-09-14', freq='7D'), columns=['date'])
inp = [{'Store':'A', 'Season':'Summer', 'Fruit':'Apple', 'needed':200, 'count':20},
{'Store':'A', 'Season':'Summer', 'Fruit':'Apple', 'needed':200, 'count':100},
{'Store':'A', 'Season':'Summer', 'Fruit':'Apple', 'needed':200, 'count':200},
{'Store':'A', 'Season':'Summer', 'Fruit':'Apple', 'needed':200, 'count':200},
{'Store':'B', 'Season':'Fall', 'Fruit':'Apple', 'needed':3000, 'count':0},
{'Store':'B', 'Season':'Fall', 'Fruit':'Apple', 'needed':3000, 'count':200},
{'Store':'B', 'Season':'Fall', 'Fruit':'Apple', 'needed':3000, 'count':1000}]
df = pd.DataFrame(inp)
df = report_date_rng.merge(df, left_index=True, right_index=True)
print(df)
date Store Season Fruit needed count
0 2021-08-01 A Summer Apple 200 20
1 2021-08-08 A Summer Apple 200 100
2 2021-08-15 A Summer Apple 200 200
3 2021-08-22 A Summer Apple 200 200
4 2021-08-29 B Fall Apple 3000 0
5 2021-09-05 B Fall Apple 3000 200
6 2021-09-12 B Fall Apple 3000 1000
I think I am looking for a output to look like this:
Store Season Fruit needed 2021-08-01 2021-08-08 2021-08-15 2021-08-22 2021-08-29 2021-09-05 2021-09-12
0 A Summer Apple 200 20 100 200 200
1 B Fall Apple 3000 0 200 1000
Basically I am trying to group the timeseries data by store the other categorical columns and then track the count for each group by date.
I would also prefer to do this in the long (row) format rather than the wide (column) format for the dates.
This is something I tried so far, but it does not work with the other categorical variables.
df[['date', 'Store', 'count']].set_index(['date','Store']).stack().unstack(0).reset_index()