Keep upper n rows of a pandas dataframe based on condition

Question

how would I delete all rows from a dataframe that come after a certain fulfilled condition? As an example I have the following dataframe:

import pandas as  pd
xEnd=1
yEnd=2
df = pd.DataFrame({'x':[1,1,1,2,2,2], 'y':[1,2,3,3,4,3], 'id':[0,1,2,3,4,5]})

How would i get a dataframe that deletes the last 4 rows and keeps the upper 2 as in row 2 the condition x=xEnd and y=yEnd is fulfilled. EDITED: should have mentioned that the dataframe is not necessarily ascending. Could also be descending and i still would like to get the upper ones.

score 3 · Answer 1 · answered Oct 27 '18 at 15:49

3

To slice your dataframe until the first time a condition across 2 series are satisfied, first calculate the required index and then slice via iloc.

You can calculate the index via set_index, isin and np.ndarray.argmax:

idx = df.set_index(['x', 'y']).isin((xEnd, yEnd)).values.argmax()
res = df.iloc[:idx+1]

print(res)

   x  y  id
0  1  1   0
1  1  2   1

If you need better performance, see Efficiently return the index of the first value satisfying condition in array.

answered Oct 27 '18 at 15:49

jpp

159,742
34
281
339

That works fine, thanks a lot! I made it to `idx+2`, to keep the row itself as well. – Mauritius Oct 27 '18 at 15:54
@Mauritius, That's strange: `idx+1` should do it (as in my example). – jpp Oct 27 '18 at 15:58
1

you are right of course! realized now when compiling. My first try did sth weird but yes `idx+1` is correct. Thanks again for your help! – Mauritius Oct 27 '18 at 16:02

Christian Sloper · Accepted Answer · 2018-10-27T15:51:47.013

1

not 100% sure i understand correctly, but you can filter your dataframe like this:

 df[(df.x <= xEnd) & (df.y <= yEnd)]

this yields the dataframe:

   id   x   y   
0   0   1   1   
1   1   1   2

If x and y are not strictly increasing and you want whats above the line that satisfy condition:

 df[df.index <= (df[(df.x == xEnd) & (df.y == yEnd)]).index.tolist()]

edited Oct 27 '18 at 15:51

answered Oct 27 '18 at 15:45

Christian Sloper

7,440
3
15
28

that works fine for the example, but it is not given that the upper rows are smaller than `xEnd` and `yEnd`. So i must somehow get the row index of the row that fulfills the condition – Mauritius Oct 27 '18 at 15:48
(assumes that xEnd yEnd is only satisfied in one row) – Christian Sloper Oct 27 '18 at 15:52
1

works fine thanks! yeah it should be only contained in one row – Mauritius Oct 27 '18 at 15:56
I get a `ValueError: operands could not be broadcast together with shapes (30,) (0,)` from time to time. Do you know what it means? @Christian Sloper – Mauritius Oct 28 '18 at 17:17
presumably there is no row that satisfy yEnd. – Christian Sloper Oct 28 '18 at 17:18
https://stackoverflow.com/questions/53123712/valueerror-operands-could-not-be-broadcast-together-with-shapes-6-0-when – Mauritius Nov 02 '18 at 18:05
might interest you. I a got some weird error using your approach – Mauritius Nov 02 '18 at 18:06

SaKu. · Answer 3 · 2018-10-27T15:51:26.220

0

df = df.iloc[[0:yEnd-1],[:]]

Select just first two rows and keep all columns and put it in new dataframe. Or you can use the same name of variable too.

edited Oct 27 '18 at 15:51

answered Oct 27 '18 at 15:45

SaKu.

389
4
8

Keep upper n rows of a pandas dataframe based on condition

3 Answers3

Linked