getting the previous values from the dataframe column

Question

I am new to the python . Now, Here I have a dataframe which looks like this .

DocumentId                    offset        feature
0                              0              2000             
0                              7              2000
0                              16             0
0                              27             0
0                              36             0 
0                              40             0
0                              46             0
0                              57             0
0                              63             0
0                              78             0
0                              88             0
0                              91             0   
0                              103           2200
1                               109           0
1                               113           2200
1                               126           2200  
1                               131           2200 
1                                137           2200
1                                 142            0
1                                 152            0
1                                 157           200 
1                                159           200
1                                 161           200
1                                   167            0
1                                 170           200

Now, Here In this data-frame, I have a column called feature . It has some values, where it also has a value 0 and others are positive values.

Now Here, whenever the positive value gets changed to the zero value then we have to take the previous three values of that. So,we are assuming that it will always start with the positive value .

So from this data set ,

I will get the result for first will be

 offset1     feature   offset2   feature   offset3   feature   
       -           -         0        2000       7       2000  

   for the `(-)` as it does not have the third value.

Now, it goes on checking the values, whenever the value positive to zero becomes I have to take the previous three of that.

So, Here ,

after 2200 which is at offset 103 value becomes the 0 which is at 109

Now, I have to take the previous three values of the 109 offset so the data becomes

offset1     feature    offset2    feature   offset3   feature   
   -           -          0        2000       7       2000  
   103        2200        91         0         88         0

After this the value from positive to 0 becomes at offset 142 So, I have to take previous three .

offset1     feature    offset2    feature   offset3   feature   
       -           -          0        2000       7       2000  
       103        2200        91         0         88         0 
       137        2200        131      2200       126      2200
       161        200         159       200       157      200



offset4       feature      offset5  feature      offset6   feature
     103           2200         109       0           113       2200
      113          2200          126      2200        131       2200
      157          200           159      200         161        200

So, Here If we see in the dataframe which starts with a positive number now, If we go down then at

the place `103` the 0 becomes a positive value which is the 2200

Now ,Here I am taking the next two values of that offset.which are 109 and 113 and also the current number value which will be 103

SO, the result will be

offset4       feature      offset5  feature      offset6   feature
     103           2200         109       0           113       2200

Now, after this If I go down, then immediately at 113 the previous value was 0 that gets changed to 2200

SO, I will take the next two values after the 113 that are 126, 131 so the next becomes

offset4      feature4      offset5   feature5    offset5    feature5
113          2200          126       2200         131       2200

Now again down then at 157 it gets changed to the 200 as so it will be same.

for this next values I am taking the next two values and itself so it become 3 values. Here whenever the value gets changed from 0 to a positive In this way I am taking the next values.

Is there any way though which I can achieve this result ? thanks.

what I tried is

zero_indexes = list(input_with[input_csv['RFC_PREDICTEDFEATURE'] == 0].index)
df2 = pd.DataFrame()
for each_zero_index in zero_indexes:
    value = input_with['feature'].loc[each_zero_index - 1: each_zero_index].values[0]
    if value != 0 :
        df1 = input_with.loc[each_zero_index - 3: each_zero_index]
        df2 = df2.append(df1)

jezrael · Accepted Answer · 2019-11-05T10:20:31.027

Use strides:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

n = 3
x = np.concatenate([[np.nan] * (n - 1), df['feature'].values])
#change order by indexing [:, ::-1]
arr1 = rolling_window(x, n)[:, ::-1]

print (arr1)
[[2000.   nan   nan]
 [2000. 2000.   nan]
 [   0. 2000. 2000.]
 [   0.    0. 2000.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [   0.    0.    0.]
 [2200.    0.    0.]
 [   0. 2200.    0.]
 [2200.    0. 2200.]
 [2200. 2200.    0.]
 [2200. 2200. 2200.]
 [2200. 2200. 2200.]
 [   0. 2200. 2200.]
 [   0.    0. 2200.]
 [ 200.    0.    0.]
 [ 200.  200.    0.]
 [ 200.  200.  200.]
 [   0.  200.  200.]
 [ 200.    0.  200.]]

x = np.concatenate([[np.nan] * (n - 1), df['offset'].values])
arr2 = rolling_window(x, n)[:, ::-1]

print (arr2)
[[  0.  nan  nan]
 [  7.   0.  nan]
 [ 16.   7.   0.]
 [ 27.  16.   7.]
 [ 36.  27.  16.]
 [ 40.  36.  27.]
 [ 46.  40.  36.]
 [ 57.  46.  40.]
 [ 63.  57.  46.]
 [ 78.  63.  57.]
 [ 88.  78.  63.]
 [ 91.  88.  78.]
 [103.  91.  88.]
 [109. 103.  91.]
 [113. 109. 103.]
 [126. 113. 109.]
 [131. 126. 113.]
 [137. 131. 126.]
 [142. 137. 131.]
 [152. 142. 137.]
 [157. 152. 142.]
 [159. 157. 152.]
 [161. 159. 157.]
 [167. 161. 159.]
 [170. 167. 161.]]

Then create mask for positive and next 0 value in first 'column' of arr1:

m =  np.append(arr1[1:, 0] == 0, False) & (arr1[:, 0] != 0)
arr1 = arr1[m] 
arr2 = arr2[m] 

#change order in first row
arr1[:1] = arr1[:1, ::-1]
arr2[:1] = arr2[:1, ::-1]


#create DataFrames and join together    
df1 = pd.DataFrame(arr1).add_prefix('feature_')
df2 = pd.DataFrame(arr2).add_prefix('offset_')
print (df1)
   feature_0  feature_1  feature_2
0        NaN     2000.0     2000.0
1     2200.0        0.0        0.0
2     2200.0     2200.0     2200.0
3      200.0      200.0      200.0

print (df2)
   offset_0  offset_1  offset_2
0       NaN       0.0       7.0
1     103.0      91.0      88.0
2     137.0     131.0     126.0
3     161.0     159.0     157.0

#https://stackoverflow.com/a/45122187/2901002
#change order of columns
a = np.arange(n).astype(str)
cols = [item for x in a for item in ('offset_' + x, 'feature_' + x)]
df = pd.concat([df1, df2], axis=1)[cols]
print (df)
   offset_0  feature_0  offset_1  feature_1  offset_2  feature_2
0       NaN        NaN       0.0     2000.0       7.0     2000.0
1     103.0     2200.0      91.0        0.0      88.0        0.0
2     137.0     2200.0     131.0     2200.0     126.0     2200.0
3     161.0      200.0     159.0      200.0     157.0      200.0

EDIT:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

n = 3
x = np.concatenate([[np.nan] * (n - 1), df['feature'].values])
arr1 = rolling_window(x, n)[:, ::-1]
#print (arr1)

x = np.concatenate([[np.nan] * (n - 1), df['offset'].values])
arr2 = rolling_window(x, n)[:, ::-1]
#print (arr2)

x1 = np.concatenate([arr1[:, 0], [np.nan] * (n - 1)])
arr11 = rolling_window(x1, n)
#print (arr11)

x2 = np.concatenate([arr2[:, 0], [np.nan] * (n - 1)])
arr22 = rolling_window(x2, n)
#print (arr22)

m1 =  np.append(False, arr11[:-1, 0] == 0) & (arr11[:, 0] != 0)

arr11 = arr11[m1] 
arr22 = arr22[m1] 

df11 = pd.DataFrame(arr11).rename(columns = lambda x: x + 4).add_prefix('feature_')
df22 = pd.DataFrame(arr22).rename(columns = lambda x: x + 4).add_prefix('offset_')
print (df11)
   feature_4  feature_5  feature_6
0     2200.0        0.0     2200.0
1     2200.0     2200.0     2200.0
2      200.0      200.0      200.0
3      200.0        NaN        NaN

print (df22)
   offset_4  offset_5  offset_6
0     103.0     109.0     113.0
1     113.0     126.0     131.0
2     157.0     159.0     161.0
3     170.0       NaN       NaN

a = np.arange(4, n + 4).astype(str)
cols = [item for x in a for item in ('offset_' + x, 'feature_' + x)]
df1 = pd.concat([df11, df22], axis=1)[cols]
print (df1)
   offset_4  feature_4  offset_5  feature_5  offset_6  feature_6
0     103.0     2200.0     109.0        0.0     113.0     2200.0
1     113.0     2200.0     126.0     2200.0     131.0     2200.0
2     157.0      200.0     159.0      200.0     161.0      200.0
3     170.0      200.0       NaN        NaN       NaN        NaN

m =  np.append(arr1[1:, 0] == 0, False) & (arr1[:, 0] != 0)
arr1 = arr1[m] 
arr2 = arr2[m] 

arr1[:1] = arr1[:1, ::-1]
arr2[:1] = arr2[:1, ::-1]

df1 = pd.DataFrame(arr1).add_prefix('feature_')
df2 = pd.DataFrame(arr2).add_prefix('offset_')
print (df1)
   feature_0  feature_1  feature_2
0        NaN     2000.0     2000.0
1     2200.0        0.0        0.0
2     2200.0     2200.0     2200.0
3      200.0      200.0      200.0

print (df2)
   offset_0  offset_1  offset_2
0       NaN       0.0       7.0
1     103.0      91.0      88.0
2     137.0     131.0     126.0
3     161.0     159.0     157.0

a = np.arange(n).astype(str)
cols = [item for x in a for item in ('offset_' + x, 'feature_' + x)]
df = pd.concat([df1, df2], axis=1)[cols]
print (df)
   offset_0  feature_0  offset_1  feature_1  offset_2  feature_2
0       NaN        NaN       0.0     2000.0       7.0     2000.0
1     103.0     2200.0      91.0        0.0      88.0        0.0
2     137.0     2200.0     131.0     2200.0     126.0     2200.0
3     161.0      200.0     159.0      200.0     157.0      200.0

one thing that I forgot here is that it is based on the document ID. I will just add that in the question. — ganesh kaspate, Nov 04 '19 at 15:15
So the thing is that It if the document ID has changed then it should not take the previous values from the previous document ID. — ganesh kaspate, Nov 04 '19 at 15:20
your solution is the perfect.. I actually have to take the next three values where the value gets changed from zero to positive now ? what changes do I need to make in the same code ? — ganesh kaspate, Nov 05 '19 at 06:33
So, that the current dataframe will become now 4 , 5 ,6 in the same row ? — ganesh kaspate, Nov 05 '19 at 06:34
I have just updated the question with the next values for the same.. could you please let me know what change do I need to do ? — ganesh kaspate, Nov 05 '19 at 06:49
yes your solution is working for the first change that is for previous three and I have to the next two as well as I have added in the question. — ganesh kaspate, Nov 05 '19 at 08:33
@GaneshKaspate - Can you explin logic for edited data? What means next 2? — jezrael, Nov 05 '19 at 08:57
I mean as we are taking the previous three values in the first answer. So, Now I want to have the next two values in the same way but when the value gets changed from zero to positive that number s value. — ganesh kaspate, Nov 05 '19 at 09:01
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/201871/discussion-between-ganesh-kaspate-and-jezrael). — ganesh kaspate, Nov 05 '19 at 09:10
coulf you please look at https://stackoverflow.com/questions/58728523/find-an-replace-the-values-in-between-the-two-indexes-in-a-python-data-frame if you have some time ? — ganesh kaspate, Nov 06 '19 at 11:15

getting the previous values from the dataframe column

1 Answers1