How to select all rows which contain values in selected columns greater than a threshold?

Question

I'm trying to do the same thing as in this question, but I have have a string-type column that I need to keep in the dataframe so I can identify which rows are which. (I guess I could do this by index, but I'd like to be able to save a step.) Is there a way to not count a column when using .any(), but keep it in the resulting dataframe? Thanks!

Here's the code that words on all columns:

df[(df > threshold).any(axis=1)]

Here's the hard coded version I'm working with right now:

df[(df[list_of__selected_columns] > 3).any(axis=1)]

This seems a little clumsy to me, so I'm wondering if there's a better way.

my data is the same as the data in example of the first question linked to: https://stackoverflow.com/a/42613567/12399409 — semblable, May 26 '20 at 22:53

score 1 · Accepted Answer · answered May 26 '20 at 21:55

You can use .select_dtype to choose all, say numerical columns:

df[df.select_dtype(include='number').gt(threshold).any(axis=1)]

Or a chunk of continuous columns with iloc:

df[df.iloc[:,3:6].gt(threshold).any(axis=1)]

If you want to select some random list of columns, you'd be best to resolve by hard coded list.

How to select all rows which contain values *in selected columns* greater than a threshold?

1 Answers1

How to select all rows which contain values in selected columns greater than a threshold?