1

I have a df:

      population     plot1     plot2     plot3     plot4
0    Population1  Species1  Species1  Species2  Species2
1    Population2  Species4  Species2  Species3  Species4
2    Population3  Species1  Species2  Species1  Species2
3    Population4  Species4  Species4  Species4  Species4
4    Population5  Species2  Species2  Species4  Species2
5    Population6  Species4  Species3  Species3  Species4
6    Population7  Species3  Species4  Species1  Species3
7    Population8  Species4  Species4  Species4  Species4
8    Population9  Species3  Species4  Species2  Species3
9   Population10  Species1  Species3  Species2  Species4
10  Population11  Species2  Species4  Species2  Species4

I want to create a new dataframe with all rows (populations) in which Species4 occurs more than once are removed. I've tried several ways using .value_counts() but can't work out a way to apply it across the entire dataframe at once, rather than just by simply looping thru all rows (which takes a long time on the large dataset I have).

So, I tried:

dat.drop(dat.value_counts()['Species4'] > 1)

but .value_counts() cannot be applied to the entire df.

user3329732
  • 346
  • 2
  • 15

2 Answers2

6

Using pandas.DataFrame.eq:

new_df = df[df.eq('Species4').sum(1).le(1)]
# or
new_df = df[~df.eq('Species4').sum(1).gt(1)]
print(new_df)

Output:

     population     plot1     plot2     plot3     plot4
0   Population1  Species1  Species1  Species2  Species2
2   Population3  Species1  Species2  Species1  Species2
4   Population5  Species2  Species2  Species4  Species2
6   Population7  Species3  Species4  Species1  Species3
8   Population9  Species3  Species4  Species2  Species3
9  Population10  Species1  Species3  Species2  Species4
Chris
  • 29,127
  • 3
  • 28
  • 51
1

Use bool indexing like this for multiple condition indexing or any other combination of bool functions.

df[((df == "Species4").sum(axis=1) > 1) & ((df == "Species1").sum(axis=1) > 1)]

zong fan
  • 386
  • 3
  • 9