1

I have a dataframe as following, just a example.

date       y     w   diff
 2010-1-1   3     1    3
 2010-1-2   4     1    4
 2010-1-3   5     1    2
 2010-1-4   6     2    5
 2010-1-5   7     2    6
 2010-1-6   8     2    5
 2010-1-7   9     3    2
 2010-1-8   10    4    4
 2010-1-9   11    5    5
 2010-1-10  12    6    6
 2010-1-11  13    5    6

Now for example i is the index of dataframe, I want to add new column for the dataframe, there are three new column name is like, p1, p2, p3, but the value is value of previous two date. Of course, the previous two rows of values p1, p2 is Nan. From 3-5 rows, the value of p1, p2 all are 3, 4, and value of p3 is value of last diff of previous two rows, I mean from 3-5 rows the value of p3 all are 4. I use the five rows as a period. I mean the 8-10 rows, the value of p1, p2, p3 are 8, 9, 2. The new dataframe like as following:

 date       y     w   diff  p1  p2  p3
 2010-1-1   3     1    3    Nan Nan Nan
 2010-1-2   4     1    4    Nan Nan Nan
 2010-1-3   5     1    2    3   4   4
 2010-1-4   6     2    5    3   4   4
 2010-1-5   7     2    6    3   4   4
 2010-1-6   8     2    5    Nan Nan Nan 
 2010-1-7   9     3    2    Nan Nan Nan
 2010-1-8   10    4    4    8   9    2
 2010-1-9   11    5    5    8   9    2
 2010-1-10  12    6    6    8   9    2
 2010-1-11  13    5    6    Nan Nan Nan

If there are something you don't understand my question, please comment it. thanks!

tktktk0711
  • 1,656
  • 7
  • 32
  • 59

1 Answers1

1

You can use groupby by array g created by arange and floor division with custom function with shift and then set values in numpy array by requirements. Last add to original by join:

df['date'] = pd.to_datetime(df['date'])
g = np.arange(len(df.index)) // 5

def f(x):
    x = x.shift(2)
    a = x.values
    if a.shape[0] > 3:
        a[3,1] = a[3, 0]
        a[3,0] = a[2, 0]
        a[2] = a[3]
        a[4] = a[3]
    return pd.DataFrame(a, index=x.index, columns=['p1','p2','p3'])


df1 = df.groupby(g)['y','w','diff'].apply(f)
print (df1)
     p1   p2   p3
0   NaN  NaN  NaN
1   NaN  NaN  NaN
2   3.0  4.0  4.0
3   3.0  4.0  4.0
4   3.0  4.0  4.0
5   NaN  NaN  NaN
6   NaN  NaN  NaN
7   8.0  9.0  2.0
8   8.0  9.0  2.0
9   8.0  9.0  2.0
10  NaN  NaN  NaN

df2 = df.join(df1)
print (df2)
         date   y  w  diff   p1   p2   p3
0  2010-01-01   3  1     3  NaN  NaN  NaN
1  2010-01-02   4  1     4  NaN  NaN  NaN
2  2010-01-03   5  1     2  3.0  4.0  4.0
3  2010-01-04   6  2     5  3.0  4.0  4.0
4  2010-01-05   7  2     6  3.0  4.0  4.0
5  2010-01-06   8  2     5  NaN  NaN  NaN
6  2010-01-07   9  3     2  NaN  NaN  NaN
7  2010-01-08  10  4     4  8.0  9.0  2.0
8  2010-01-09  11  5     5  8.0  9.0  2.0
9  2010-01-10  12  6     6  8.0  9.0  2.0
10 2010-01-11  13  5     6  NaN  NaN  NaN
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • thanks @jezrael, there is another question, could you help me solve it.https://stackoverflow.com/questions/44752876/python2-pandas-how-to-merge-a-part-of-another-dataframe-to-a-dataframe – tktktk0711 Jun 26 '17 at 03:49