1

I have dataframe with columns value,ID,distance and distance2. i want to extract previous row when value of column distance or distance2 change from 0 to value range for distance column 4000 to 5000 and for distance2 column when value change from 0 to range 3000 to 4000 .

here is my example df

df=pd.DataFrame({'value':[3,4,7,8,11,20,15,20,15,16],
             'ID':[2,2,8,8,8,2,2,2,5,5],
             'distance':[0,0,0,4008,0,0,4820,0,0,0],'distance2':[0,0,0,3006,0,0,0,1,3990,0]})





    value  ID  distance  distance2
0      3   2         0          0
1      4   2         0          0
2      7   8         0          0
3      8   8      4008       3006
4     11   8         0          0
5     20   2         0          0
6     15   2      4820          0
7     20   2         0          1
8     15   5         0       3990
9     16   5         0          0
desired output

  value  ID  distance  distance2
0      7   8      4008       3006
1     20   2      4820          0
2     20   2         0       3990
Nickel
  • 580
  • 4
  • 19

1 Answers1

0

I tried to modify the accepted answer from iterrows pandas get next rows value, and this seems to work:

row_iterator = df.iterrows()
_, last = next(row_iterator)
df_new = []

for index, row in row_iterator:
    if ((4000 < row.distance < 5000) & (last.distance == 0)) | ((3000 < row.distance2 < 4000) & (last.distance2 == 0)):
        df_new.append([last.value, last.ID, row.distance, row.distance2])
    last = row
df_new = pd.DataFrame(df_new, columns=df.columns)
AdibP
  • 2,819
  • 1
  • 10
  • 24
  • with this can get desired output ,but when i will use this for original data the process will be slow because i have data with millions row – Nickel Oct 31 '19 at 05:45
  • that's true, but when I compare to the previous answer that use `.diff()` this actually runs faster using the data above. I don't know maybe because the data is too small. – AdibP Oct 31 '19 at 06:23