1

I have a dataframe, df1, which consists of entities across time and their corresponding value. df1 looks like this:

Index        Date    ID   Values           
   0     2016-11-15   1      3
   1     2016-11-16   1      5
   2     2016-11-16   2      6
   3     2016-11-17   2      7            
   4     2016-11-18   2      2   

Obviously, there are gaps in the dates for both IDs.
I would first like to fill the date and ID gaps so it looks like this:

 Index        Date   ID   Values           
   0     2016-11-15   1      3
   1     2016-11-16   1      5
   2     2016-11-17   1      0
   3     2016-11-18   1      0
   4     2016-11-15   2      0
   5     2016-11-16   2      6
   6     2016-11-17   2      7            
   7     2016-11-18   2      2    

Then, I want to create a lagged column for "Values" so it uses the previous day's value for each id:

 Index        Date   ID   Values  Lagged Values          
   0     2016-11-15   1      3         Nan
   1     2016-11-16   1      5          3
   2     2016-11-17   1      0          5
   3     2016-11-18   1      0          0
   4     2016-11-15   2      0         Nan
   5     2016-11-16   2      6          0
   6     2016-11-17   2      7          6 
   7     2016-11-18   2      2          7

Thanks!

Tylerr
  • 201
  • 2
  • 12

2 Answers2

0

You can use cumcount +1 to get the ID Column from dates and then you can use numpy and set nan to column based on the cumcount = 0 i.e
Example:

import numpy as np
df = pd.DataFrame({'Date':['15-6-17','16-6-17','17-6-17','18-6-17','15-6-17','16-6-17','17-6-17','18-6-17'],"Values":[3,5,7,0,0,1,7,9]})
df['ID'] = df.groupby(['Date']).cumcount()+1
df['Lagged Values'] = np.insert(df.Values.values,0,0)[:-1]
df.loc[df.groupby(["ID"]).cumcount() == 0,'Lagged Values']= np.nan
    
      Date  Values  ID  Lagged Values
0  15-6-17       3   1            NaN
1  16-6-17       5   1            3.0
2  17-6-17       7   1            5.0
3  18-6-17       0   1            7.0
4  15-6-17       0   2            NaN
5  16-6-17       1   2            0.0
6  17-6-17       7   2            1.0
7  18-6-17       9   2            7.0
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
0

Let's try this:

df2 = (df1.set_index('Date')
          .groupby('ID')
          .apply(lambda x: x.reindex_axis(pd.date_range(df1.Date.min(), df1.Date.max())))
          .drop('ID', axis=1)
          .fillna(0))

df2['Lagged Values'] = (df2.groupby('ID',as_index=False)
                          .apply(lambda x: x.shift(1).Values)
                          .reset_index(level=0, drop=True))

print(df2.reset_index())

Output:

   ID    level_1  Values  Lagged Values
0   1 2016-11-15     3.0            NaN
1   1 2016-11-16     5.0            3.0
2   1 2016-11-17     0.0            5.0
3   1 2016-11-18     0.0            0.0
4   2 2016-11-15     0.0            NaN
5   2 2016-11-16     6.0            0.0
6   2 2016-11-17     7.0            6.0
7   2 2016-11-18     2.0            7.0
Scott Boston
  • 147,308
  • 15
  • 139
  • 187