3

I'd like to return the rows which qualify to a certain condition. I can do this for a single row, but I need this for multiple rows combined. For example 'light green' qualifies to 'XYZ' being positive and 'total' > 10, where 'Red' does not. When I combine a neighbouring row or rows, it does => 'dark green'. Can I achieve this going over all the rows and not return duplicate rows?

N = 1000

np.random.seed(0)

df = pd.DataFrame(
    {'X':np.random.uniform(-3,10,N),
     'Y':np.random.uniform(-3,10,N),
     'Z':np.random.uniform(-3,10,N),
    })

df['total'] = df.X + df.Y + df.Z

df.head(10)

enter image description here

EDIT;

Desired output is 'XYZ'> 0 and 'total' > 10

JohnE
  • 29,156
  • 8
  • 79
  • 109
Zanshin
  • 1,262
  • 1
  • 14
  • 30
  • what is the desired output and criteria? – harshil9968 Jan 11 '17 at 12:12
  • `df[(df.X > 0) & (df.Y > 0) & (df.Z > 0) & (df.total > 10)]` have you tried this? – harshil9968 Jan 11 '17 at 12:13
  • put in an EDIT, mind you the problem here is combining consecutive rows that qualify the condition. In my example it should return all rows in 'dark green', even though they have negative values. – Zanshin Jan 11 '17 at 12:18
  • try the the solution I posted above , does it work? What is the solution you're trying? – harshil9968 Jan 11 '17 at 12:19
  • Possible duplicate of [Using pandas to select rows using two different columns from dataframe?](http://stackoverflow.com/questions/13937022/using-pandas-to-select-rows-using-two-different-columns-from-dataframe) – harshil9968 Jan 11 '17 at 12:21
  • No, it isn't. The function should check multiple rows at once if they qualify as a set. So 'XYZ' are summed for rows 7, 8 & 9 and then qualify and get returned, individually they do not. – Zanshin Jan 11 '17 at 12:26
  • @Zanshin I don't think I understand. What do you mean by consecutive row? Rows 7 and 8 obviously do not qualify since Y<0. Do you mean the product of X times Y times Z? – jf328 Jan 11 '17 at 13:20
  • as a set, 'XYZ' in the example as a set. X7 + X8 + X9 summed >0 etc. – Zanshin Jan 11 '17 at 13:25
  • This is still unclear @Zanshin. You mean to say any combination of three rows where the sum of X + Y + Z is greater than 10 and each of X, Y, Z is positive? If so, you may want to think about `itertools.combinations`. – ashishsingal Jan 11 '17 at 15:18

2 Answers2

1

Here's a try. You would maybe want to use rolling or expanding (for speed and elegance) instead of explicitly looping with range, but I did it that way so as to be able to print out the rows being used to calculate each boolean.

df = df[['X','Y','Z']]    # remove the "total" column in order
                          # to make the syntax a little cleaner

df = df.head(4)           # keep the example more manageable

for i in range(len(df)):
    for k in range( i+1, len(df)+1 ):
        df_sum = df[i:k].sum()
        print( "rows", i, "to", k, (df_sum>0).all() & (df_sum.sum()>10) )

rows 0 to 1 True
rows 0 to 2 True
rows 0 to 3 True
rows 0 to 4 True
rows 1 to 2 False
rows 1 to 3 True
rows 1 to 4 True
rows 2 to 3 True
rows 2 to 4 True
rows 3 to 4 True
JohnE
  • 29,156
  • 8
  • 79
  • 109
  • yes all possible combinations for neighbouring rows (2,3..n) note: the rows can be viewed as bins. So, consecutive rows form a larger bin. – Zanshin Jan 11 '17 at 16:09
-1

I am not too sure if I understood your question correctly, but if you are looking to put multiple conditions within a dataframe, you can consider this approach:

new_df = df[(df["X"] > 0) & (df["Y"] < 0)]

The & condition is for AND, while replacing that with | is for OR condition. Do remember to put the different conditions in ().

Lastly, if you want to remove duplicates, you can use this

new_df.drop_duplicates()

You can find more information about this function at here: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.drop_duplicates.html

Hope my answer is useful to you.

Cliff Chew
  • 906
  • 1
  • 7
  • 15