I'd like to extract the bins for 'S' individually, where each column (X&Y) > 0.5, or multiple bins > 0.5 * 'number of rows'.
In the example;
for 'AR1' should only bin 4 be selected, because 'X' and 'Y' are > 0.5 (blue indicated)
for 'PO1' should bins 1, 2, 3 and 4 be selected, because 'X' and 'Y' are > (4 * 0.5) (yellow indicated).
I've tried this earlier with a for loop
, but that didn't work correctly;
Selecting multiple (neighboring) rows conditionally
np.random.seed(0)
N = 20
S = ['AR1', 'PO1']
df = pd.DataFrame(
{'X':np.random.uniform(-1,1,N),
'Y':np.random.uniform(-1,1,N),
'S':np.random.choice(S,N),
})
df['bins_X'] = df.groupby('S')['X'].apply(pd.qcut, q=5, labels=np.arange(5)) # create bins per column 'S'
def func(df): # create function to group per 'S' and their bins
df1 = df.groupby(['S','bins_X']).sum()
new_cols= list(zip(df1.columns.get_level_values(0)))
df1.columns = pd.MultiIndex.from_tuples(new_cols)
return df1
print func(df)
EDIT
What it should look like is the df as shown in the question, but the rows that do not qualify filtered out. What I check for is this ; the values in X and Y > 0.5 for any row(bin) separately or combined. Combinations of rows only consecutively, 2, 3, 4 or 5 rows combined.
I.e, the combinations of rows for 0 are then ; 0+1, 0+1+2, 0+1+2+3 and 0+1+2+3+4. For 1 ; 1+2, 1+2+3 and 1+2+3+4 etc.
Multiple rows would sum to the number of rows x 0.5, X and Y would have to be > 2.5 for rows 0 to 4 for example.
EDIT2: @JohnE and piRSquared, both your solutions work, which one however would work better when there are other columns in the dataframe that should not be evaluated?
In addition, what if I would want to addin an additional condition in your solutions?
EDIT3: @piRSquared, When subsetting some columns I only get those returned, where I would need all of them, not only the subsetted.
Could you assist? Thanks.