I've tried to apply the solution provided in this question to my real data: Selecting rows in a MultiIndexed dataframe. Somehow I cannot get the results it should give. I've attached both the dataframe to select from, as well as the result.
What I need;
Rows 3, 11 AND 12 should be returned (when you add the 4 columns consecutively, 12 should be selected as well. It isn't now).
df_test = pd.read_csv('df_test.csv')
def find_window(df):
v = df.values
s = np.vstack([np.zeros((1, v.shape[1])), v.cumsum(0)])
threshold = 0
r, c = np.triu_indices(s.shape[0], 1)
d = (c - r)[:, None]
e = s[c] - s[r]
mask = (e / d < threshold).all(1)
rng = np.arange(mask.shape[0])
if mask.any():
idx = rng[mask][d[mask].argmax()]
i0, i1 = r[idx], c[idx]
return pd.DataFrame(
v[i0:i1],
df.loc[df.name].index[i0:i1],
df.columns
)
cols = ['2012', '2013', '2014', '2015']
df_test.groupby(level=0)[cols].apply(find_window)
csv_file is here: https://docs.google.com/spreadsheets/d/19oOoBdAs3xRBWq6HReizlqrkWoQR2159nk8GWoR_4-g/edit?usp=sharing
EDIT: Correct dataframes added.
Note: Blue frame = rows which should be returned, yellow frames is consecutive column values which are < 0 (threshold).