1

how would I delete all rows from a dataframe that come after a certain fulfilled condition? As an example I have the following dataframe:

import pandas as  pd
xEnd=1
yEnd=2
df = pd.DataFrame({'x':[1,1,1,2,2,2], 'y':[1,2,3,3,4,3], 'id':[0,1,2,3,4,5]})

How would i get a dataframe that deletes the last 4 rows and keeps the upper 2 as in row 2 the condition x=xEnd and y=yEnd is fulfilled. EDITED: should have mentioned that the dataframe is not necessarily ascending. Could also be descending and i still would like to get the upper ones.

jpp
  • 159,742
  • 34
  • 281
  • 339
Mauritius
  • 265
  • 1
  • 8
  • 23

3 Answers3

3

To slice your dataframe until the first time a condition across 2 series are satisfied, first calculate the required index and then slice via iloc.

You can calculate the index via set_index, isin and np.ndarray.argmax:

idx = df.set_index(['x', 'y']).isin((xEnd, yEnd)).values.argmax()
res = df.iloc[:idx+1]

print(res)

   x  y  id
0  1  1   0
1  1  2   1

If you need better performance, see Efficiently return the index of the first value satisfying condition in array.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • That works fine, thanks a lot! I made it to `idx+2`, to keep the row itself as well. – Mauritius Oct 27 '18 at 15:54
  • @Mauritius, That's strange: `idx+1` should do it (as in my example). – jpp Oct 27 '18 at 15:58
  • 1
    you are right of course! realized now when compiling. My first try did sth weird but yes `idx+1` is correct. Thanks again for your help! – Mauritius Oct 27 '18 at 16:02
1

not 100% sure i understand correctly, but you can filter your dataframe like this:

 df[(df.x <= xEnd) & (df.y <= yEnd)]

this yields the dataframe:

   id   x   y   
0   0   1   1   
1   1   1   2 

If x and y are not strictly increasing and you want whats above the line that satisfy condition:

 df[df.index <= (df[(df.x == xEnd) & (df.y == yEnd)]).index.tolist()]
Christian Sloper
  • 7,440
  • 3
  • 15
  • 28
0

df = df.iloc[[0:yEnd-1],[:]]

Select just first two rows and keep all columns and put it in new dataframe. Or you can use the same name of variable too.

SaKu.
  • 389
  • 4
  • 8