Context
I'm working on a DataFrame df
with lots of columns filled with numerical values
df
lorem ipsum | dolor sic | ... | (hundreds of cols)
---------------------------------------------------------
0.5 | -6.2 | ... | 79.8
-26.1 | 6200.0 | ... | -65.2
150.0 | 3.14 | ... | 1.008
By another mean, I have a list_cols
of columns:
list_cols = ['lorem ipsum', 'dolor sic', ... ] # arbitrary length, of course len(list_cols ) <= len(df.columns), and contains valid columns of my df
I want to obtain 2 dataframes :
- 1 that contains all rows where
value < 0
for at least one oflist_cols
(corresponds to aOR
). let's call itnegative_values_matches
- 1 that corresponds to the remaining of dataframe, lets call it
positive_values_matches
Expected result example
for list_cols = ['lorem ipsum', 'dolor sic']
, I shall obtain dataframes were at least 1 value in list_cols is strictly negative:
negative_values_matches
lorem ipsum | dolor sic | ... | (hundreds of cols)
---------------------------------------------------------
0.5 | -6.2 | ... | 79.8
-26.1 | 6200.0 | ... | -65.2
positive_values_matches
lorem ipsum | dolor sic | ... | (hundreds of cols)
---------------------------------------------------------
150.0 | 3.14 | ... | 1.008
I don't want to write myslef this kind of code:
negative_values_matches = df[ (criterion1 | criterion2 | ... | criterionn)]
positive_values_matches = df[~(criterion1 | criterion2 | ... | criterionn)]
(where criterionk
is a boolean evaluation for column k
such as for instance: (df[col_k]>=0)
, parenthesis intended here since its the Pandas syntax)
The idea is to have a programmatic approach. I'm mainly looking for an array of booleans, so I can then use Boolean indexing (see Pandas documentation).
As far as I can tell, these posts are not exactly what I am talking about:
- Filtering DataFrame on multiple conditions in Pandas
- Drop rows on multiple conditions in pandas dataframe
- Pandas: np.where with multiple conditions on dataframes
- Pandas DataFrame : How to select rows on multiple conditions? This one is a little bit closer to what I am looking for. However, it relies on generating a string that might not work with "exotic" column names (spaces) (or at least I don't know how to do it)
I can't figure out how to chain the booleans evaluations on my DataFrame altogether with OR
operator anbd obtain the correct rows splitting.
What can I do ?