Selecting multiple (neighboring) rows conditionally

Question

I'd like to return the rows which qualify to a certain condition. I can do this for a single row, but I need this for multiple rows combined. For example 'light green' qualifies to 'XYZ' being positive and 'total' > 10, where 'Red' does not. When I combine a neighbouring row or rows, it does => 'dark green'. Can I achieve this going over all the rows and not return duplicate rows?

N = 1000

np.random.seed(0)

df = pd.DataFrame(
    {'X':np.random.uniform(-3,10,N),
     'Y':np.random.uniform(-3,10,N),
     'Z':np.random.uniform(-3,10,N),
    })

df['total'] = df.X + df.Y + df.Z

df.head(10)

EDIT;

Desired output is 'XYZ'> 0 and 'total' > 10

`df[(df.X > 0) & (df.Y > 0) & (df.Z > 0) & (df.total > 10)]` have you tried this? — harshil9968, Jan 11 '17 at 12:13
put in an EDIT, mind you the problem here is combining consecutive rows that qualify the condition. In my example it should return all rows in 'dark green', even though they have negative values. — Zanshin, Jan 11 '17 at 12:18
try the the solution I posted above , does it work? What is the solution you're trying? — harshil9968, Jan 11 '17 at 12:19
Possible duplicate of [Using pandas to select rows using two different columns from dataframe?](http://stackoverflow.com/questions/13937022/using-pandas-to-select-rows-using-two-different-columns-from-dataframe) — harshil9968, Jan 11 '17 at 12:21
No, it isn't. The function should check multiple rows at once if they qualify as a set. So 'XYZ' are summed for rows 7, 8 & 9 and then qualify and get returned, individually they do not. — Zanshin, Jan 11 '17 at 12:26
@Zanshin I don't think I understand. What do you mean by consecutive row? Rows 7 and 8 obviously do not qualify since Y<0. Do you mean the product of X times Y times Z? — jf328, Jan 11 '17 at 13:20
as a set, 'XYZ' in the example as a set. X7 + X8 + X9 summed >0 etc. — Zanshin, Jan 11 '17 at 13:25
This is still unclear @Zanshin. You mean to say any combination of three rows where the sum of X + Y + Z is greater than 10 and each of X, Y, Z is positive? If so, you may want to think about `itertools.combinations`. — ashishsingal, Jan 11 '17 at 15:18

JohnE · Accepted Answer · 2017-01-11T19:10:13.343

Here's a try. You would maybe want to use rolling or expanding (for speed and elegance) instead of explicitly looping with range, but I did it that way so as to be able to print out the rows being used to calculate each boolean.

df = df[['X','Y','Z']]    # remove the "total" column in order
                          # to make the syntax a little cleaner

df = df.head(4)           # keep the example more manageable

for i in range(len(df)):
    for k in range( i+1, len(df)+1 ):
        df_sum = df[i:k].sum()
        print( "rows", i, "to", k, (df_sum>0).all() & (df_sum.sum()>10) )

rows 0 to 1 True
rows 0 to 2 True
rows 0 to 3 True
rows 0 to 4 True
rows 1 to 2 False
rows 1 to 3 True
rows 1 to 4 True
rows 2 to 3 True
rows 2 to 4 True
rows 3 to 4 True

yes all possible combinations for neighbouring rows (2,3..n) note: the rows can be viewed as bins. So, consecutive rows form a larger bin. — Zanshin, Jan 11 '17 at 16:09

score -1 · Answer 2 · answered Jan 11 '17 at 12:15

I am not too sure if I understood your question correctly, but if you are looking to put multiple conditions within a dataframe, you can consider this approach:

new_df = df[(df["X"] > 0) & (df["Y"] < 0)]

The & condition is for AND, while replacing that with | is for OR condition. Do remember to put the different conditions in ().

Lastly, if you want to remove duplicates, you can use this

new_df.drop_duplicates()

You can find more information about this function at here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

Hope my answer is useful to you.

Selecting multiple (neighboring) rows conditionally

2 Answers2

Linked