How can I exclude rows from a pandas dataframe depending on conditions across all columns

Question

I have a df:

      population     plot1     plot2     plot3     plot4
0    Population1  Species1  Species1  Species2  Species2
1    Population2  Species4  Species2  Species3  Species4
2    Population3  Species1  Species2  Species1  Species2
3    Population4  Species4  Species4  Species4  Species4
4    Population5  Species2  Species2  Species4  Species2
5    Population6  Species4  Species3  Species3  Species4
6    Population7  Species3  Species4  Species1  Species3
7    Population8  Species4  Species4  Species4  Species4
8    Population9  Species3  Species4  Species2  Species3
9   Population10  Species1  Species3  Species2  Species4
10  Population11  Species2  Species4  Species2  Species4

I want to create a new dataframe with all rows (populations) in which Species4 occurs more than once are removed. I've tried several ways using .value_counts() but can't work out a way to apply it across the entire dataframe at once, rather than just by simply looping thru all rows (which takes a long time on the large dataset I have).

So, I tried:

dat.drop(dat.value_counts()['Species4'] > 1)

but .value_counts() cannot be applied to the entire df.

Chris · Answer 1 · 2019-12-12T08:47:22.780

6

Using pandas.DataFrame.eq:

new_df = df[df.eq('Species4').sum(1).le(1)]
# or
new_df = df[~df.eq('Species4').sum(1).gt(1)]
print(new_df)

Output:

     population     plot1     plot2     plot3     plot4
0   Population1  Species1  Species1  Species2  Species2
2   Population3  Species1  Species2  Species1  Species2
4   Population5  Species2  Species2  Species4  Species2
6   Population7  Species3  Species4  Species1  Species3
8   Population9  Species3  Species4  Species2  Species3
9  Population10  Species1  Species3  Species2  Species4

edited Dec 12 '19 at 08:47

answered Dec 12 '19 at 08:33

Chris

29,127
3
28
51

Thanks how do I stipulate more than once (2 or more)? – user3329732 Dec 12 '19 at 08:42
@user3329732 Filtering _more than once_ is same as keeping _once or less_. I've updated another line that may read more nicely :) – Chris Dec 12 '19 at 08:48

score 1 · Answer 2 · answered Dec 12 '19 at 09:15

1

Use bool indexing like this for multiple condition indexing or any other combination of bool functions.

df[((df == "Species4").sum(axis=1) > 1) & ((df == "Species1").sum(axis=1) > 1)]

answered Dec 12 '19 at 09:15

zong fan

386
3
9

How can I exclude rows from a pandas dataframe depending on conditions across all columns

2 Answers2