4

I have a dataframe that looks something like that (the date is in the format: dd/mm/yyyy):

      Param1 Param2    date       value
1       a      b    30/10/2007      5
2       a      b    31/10/2007      8
3       a      b    01/11/2007      9
4       a      b    01/12/2007      3
5       a      b    02/12/2007      2
6       a      b    01/03/2008      11
7       b      c    05/10/2008      7
8       b      c    06/10/2008      13
9       b      c    07/10/2008      19
10      b      c    08/11/2008      22
11      b      c    09/11/2008      35
12      b      c    08/12/2008      5

what I need to do, is to group by Param1 and Param2, and to create N (in my case, 3) additional columns for the 3 last previous values, that are at least 30 days away from the current row. So the output should look something like that:

      Param1 Param2    date       value  prev_1 prev_2 prev_3
1       a      b    30/10/2007      5     None   None   None
2       a      b    31/10/2007      8     None   None   None
3       a      b    01/11/2007      9     None   None   None
4       a      b    01/12/2007      3      9      8      5
5       a      b    02/12/2007      2      9      8      5
6       a      b    01/03/2008      11     2      3      9
7       b      c    05/10/2008      7     None   None   None
8       b      c    06/10/2008      13    None   None   None
9       b      c    07/10/2008      19    None   None   None
10      b      c    08/11/2008      22      19    13     7
11      b      c    09/11/2008      35      19    13     7 
12      b      c    08/12/2008      5       22    19    13 

I've tried using set_index, stack and related functions, but I just couldn't figure it out (without an ugly for).

Any help will be appreciated!

EDIT: while it is similar to this question: question

It is not exactly the same, because you can't do a simple shift as you need to check the condition of at least 30 days gap.

Binyamin Even
  • 3,318
  • 1
  • 18
  • 45
  • Possible duplicate of [How to create a lagged data structure using pandas dataframe](https://stackoverflow.com/questions/20410312/how-to-create-a-lagged-data-structure-using-pandas-dataframe) – maxymoo Jan 22 '18 at 03:17

1 Answers1

1

Here is my suggestion:

data.date = pd.to_datetime(data.date, dayfirst=True)
data['ind'] = data.index

def func(a):
    aa = data[(data.ind<a.ind)\
        &(data.Param1==a.Param1)&(data.Param2==a.Param2)&(data.date<=(a.date-np.timedelta64(30, 'D')))]
    aaa = [np.nan]*3+list(aa.value.values)
    aaaa = pd.Series(aaa[::-1][:3], index=['prev_1', 'prev_2', 'prev_3'])
    return pd.concat([a, aaaa])

data.apply(func, 1).drop('ind',1)

enter image description here

Ezer K
  • 3,637
  • 3
  • 18
  • 34
  • It is not a perfect solution as it uses sequential `apply`, which is more or less equivalent to `for`, but as it is the best (and only :) ) answer I got, I will accept it. Thank you! – Binyamin Even Jan 22 '18 at 14:15
  • thanks, let us know if you find non-sequential solution, I bet there is. – Ezer K Jan 22 '18 at 20:35