0

I have a DataFrame that when simplified looks like the following.

import pandas as pd

report_date_rng = pd.DataFrame(pd.date_range(start='2021-08-01', end='2021-09-14', freq='7D'), columns=['date'])

inp = [{'Store':'A', 'Season':'Summer', 'Fruit':'Apple', 'needed':200, 'count':20}, 
       {'Store':'A', 'Season':'Summer', 'Fruit':'Apple', 'needed':200, 'count':100}, 
       {'Store':'A', 'Season':'Summer', 'Fruit':'Apple', 'needed':200, 'count':200},
       {'Store':'A', 'Season':'Summer', 'Fruit':'Apple', 'needed':200, 'count':200},
       {'Store':'B', 'Season':'Fall', 'Fruit':'Apple', 'needed':3000, 'count':0},
       {'Store':'B', 'Season':'Fall', 'Fruit':'Apple', 'needed':3000, 'count':200},
       {'Store':'B', 'Season':'Fall', 'Fruit':'Apple', 'needed':3000, 'count':1000}]
df = pd.DataFrame(inp)

df = report_date_rng.merge(df, left_index=True, right_index=True)

print(df)

        date Store  Season  Fruit  needed  count
0 2021-08-01     A  Summer  Apple     200     20
1 2021-08-08     A  Summer  Apple     200    100
2 2021-08-15     A  Summer  Apple     200    200
3 2021-08-22     A  Summer  Apple     200    200
4 2021-08-29     B    Fall  Apple    3000      0
5 2021-09-05     B    Fall  Apple    3000    200
6 2021-09-12     B    Fall  Apple    3000   1000

I think I am looking for a output to look like this:

  Store  Season  Fruit  needed  2021-08-01 2021-08-08 2021-08-15 2021-08-22 2021-08-29 2021-09-05 2021-09-12 
0 A      Summer  Apple  200     20         100        200        200
1 B      Fall    Apple  3000                                                0          200        1000

Basically I am trying to group the timeseries data by store the other categorical columns and then track the count for each group by date.

I would also prefer to do this in the long (row) format rather than the wide (column) format for the dates.

This is something I tried so far, but it does not work with the other categorical variables.

df[['date', 'Store', 'count']].set_index(['date','Store']).stack().unstack(0).reset_index()
bkeesey
  • 466
  • 4
  • 12
  • 1
    [pivot](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html) -> `df.pivot(index=['Store', 'Season', 'Fruit', 'needed'], columns='date', values='count').reset_index().rename_axis(columns=None)` – Henry Ecker Sep 14 '21 at 15:14
  • 1
    Date can be converted to string first if wanting just the Date Part like: `df.assign(date=df['date'].dt.strftime('%Y-%m-%d')).pivot(....)` – Henry Ecker Sep 14 '21 at 15:15
  • 1
    Thank you for the link and the help. Exactly what I am looking for! – bkeesey Sep 14 '21 at 15:25

0 Answers0