many similar similar questions have been asked, it helped me a lot with this problem , I followed the help from: Fill in missing dates of groupby and Pandas- adding missing dates to DataFrame while keeping column/index values?
however it is still not doing the trick.
I made a toy dataset to demonstrate the issue that I am facing:
data = pd.DataFrame({'Date': ['2012-01-01', '2012-01-01','2012-01-01','2012-01-02','2012-01-02','2012-01-02','2012-01-03'], 'Id': ['21','21','22','21','22','23','21'], 'Quantity': ['5','1','4','4','2','1','4'], 'NetAmount': ['66','45','76','35','76','73','45']})
data['Quantity'] = data['Quantity'].astype('int')
data['NetAmount'] = data['NetAmount'].astype('float')
I grouped the dataset as shown in the code below:
data['Date'] =pd.to_datetime(data.Date) - pd.to_timedelta(7,unit = 'd')
data =data.groupby(['Id',pd.Grouper(key='Date', freq='W-MON')])['Quantity', 'NetAmount'].sum().reset_index().sort_values('Date')
data.reset_index()
data1 = data.groupby(['Id','Date']).agg({'Quantity': sum, 'NetAmount': sum}).reset_index()
then I fill the missing dates:
data2 = data1.set_index(['Date', 'Id','NetAmount']).Quantity.unstack(-3).\
reindex(columns=pd.date_range(data1['Date'].min(), data1['Date'].max(),freq='W-MON'),fill_value=0).\
stack(dropna=False).unstack().stack(dropna=False).\
unstack('NetAmount').stack(dropna=False).fillna(0).reset_index()
giving the resulting dataframe:
Id level_1 NetAmount 0
0 21 2011-12-26 45.0 0.0
1 21 2011-12-26 73.0 0.0
2 21 2011-12-26 146.0 10.0
3 21 2011-12-26 152.0 0.0
4 21 2012-01-02 45.0 4.0
5 21 2012-01-02 73.0 0.0
6 21 2012-01-02 146.0 0.0
7 21 2012-01-02 152.0 0.0
8 22 2011-12-26 45.0 0.0
9 22 2011-12-26 73.0 0.0
10 22 2011-12-26 146.0 0.0
11 22 2011-12-26 152.0 6.0
12 22 2012-01-02 45.0 0.0
13 22 2012-01-02 73.0 0.0
14 22 2012-01-02 146.0 0.0
15 22 2012-01-02 152.0 0.0
16 23 2011-12-26 45.0 0.0
17 23 2011-12-26 73.0 1.0
18 23 2011-12-26 146.0 0.0
19 23 2011-12-26 152.0 0.0
20 23 2012-01-02 45.0 0.0
21 23 2012-01-02 73.0 0.0
22 23 2012-01-02 146.0 0.0
23 23 2012-01-02 152.0 0.0
but actually I am expecting to get:
0 21 2011-12-26 66.0 5.0
1 21 2011-12-26 45.0 1.0
2 21 2011-12-26 35.0 4.0
3 21 2012-02-02 45.0 4.0
4 22 2011-12-26 76.0 4.0
5 22 2012-02-02 76.0 2.0
6 23 2011-12-26 0.0 0.0
7 23 2012-02-02 73.0 1.0
the fill worked, however, I do not understand what is going on really in the outcome dataframe, for instances in the netAmount column, the results are off I am new to unstack/stack function, Am i missing something in the process? Thank you for any help!
update: I have tried regrouping by id and data after adding the "0" values:
data2 = pd.DataFrame(data2)
data3 = data2.groupby(['Id','Date']).agg({'Quantity': sum, 'NetAmount': sum}).reset_index()
but I get this error
Traceback (most recent call last):
File "", line 48, in <module>
data3 = data2.groupby(['Id','Date']).agg({'Quantity': sum, 'NetAmount': sum}).reset_index()
File "", line 7632, in groupby
observed=observed, **kwargs)
File "", line 2110, in groupby
return klass(obj, by, **kwds)
File "", line 360, in __init__
mutated=self.mutated)
File "", line 578, in _get_grouper
raise KeyError(gpr)
KeyError: 'Date'