0

I have the following df:

time_series                    date   sales  q1  q2  q3
store_0025_item_85011885    2020-07-19  4.0 0.0 2.0 1.0
store_0025_item_85011885    2020-07-26  4.0 0.0 2.0 1.0
store_0025_item_85011885    2020-08-09  6.0 0.0 2.0 1.0
store_0025_item_85011885    2020-08-16  4.0 0.0 2.0 1.0
store_0053_item_85011885    2020-12-06  7.0 0.0 8.0 1.0
store_0053_item_85011885    2020-12-13  7.0 0.0 8.0 1.0
store_0053_item_85011885    2020-12-20  6.0 0.0 8.0 1.0
store_0053_item_85011885    2020-12-27  5.0 0.0 8.0 1.0

I tried use the pivot_table with this code:

df_p = pd.pivot_table(df_m, values='q2', index=['time_series'],
                    columns=['date'], fill_value=0)

But, return the columns with dates. What I want is return the df below:

time_series                 start_date   end_date   quantity
store_0025_item_85011885    2020-07-19   2020-07-26  2.0
store_0025_item_85011885    2020-08-09   2020-08-16  2.0
store_0053_item_85011885    2020-12-06   2020-12-27  8.0

See, that 'time_series' = store_0025_item_85011885, we have 2 consecutives weeks intervals, so we need 2 rows, but the 'time_series' = store_0053_item_85011885, we have only 1 consecutive interval, so we need 1 row. The quantity we need to copy is the 'q2' comlumn. How can I do this?

Vivian
  • 95
  • 5

1 Answers1

1

I am grouping by consecutive year week. for more explanation on grouping by consecutive elements see this:

Try:

import numpy as np
df.date = pd.to_datetime(df.date, format='%Y-%m-%d')
u = df.date.dt.strftime('%U').astype(int)
d = {'amin':'start_date','amax':'end_date','last':'quantity'}
df = df.groupby(['time_series', (u != u.shift()+1).cumsum()]).agg({'date' : [np.min, np.max], 'q2': 'last'}).rename(columns=d)
df.columns = df.columns.droplevel(0)

df:

                                start_date  end_date    quantity
time_series date            
store_0025_item_85011885    1   2020-07-19  2020-07-26  2.0
                            2   2020-08-09  2020-08-16  2.0
store_0053_item_85011885    3   2020-12-06  2020-12-27  8.0
Pygirl
  • 12,969
  • 5
  • 30
  • 43
  • 1
    use: `df = df.reset_index().drop('date', axis=1)` if you want in that format(mentioned in post) – Pygirl Jun 03 '21 at 15:41