How to select rows in Pandas dataframe where value appears more than once

Question

Let's say I have the Pandas dataframe with columns of different measurement attributes and corresponding measurement values.

ID     Parameter     Value
0      'A'           4.3
1      'B'           3.1
2      'C'           8.9
3      'A'           2.1
4      'A'           3.9
.      .             .
.      .             .
.      .             .
100    'B'           3.8

How can I filter this dataframe to only have measurements that appear more than X number of times? For example, for this dataframe I want to get all rows with more than 5 measurements (lets say only parameters 'A' and 'B' appear more than 5 times) to get a dataframe like below.

ID     Parameter     Value
0      'A'           4.3
1      'B'           3.1
3      'A'           2.1
.      .             .
.      .             .
.      .             .
100    'B'           3.8

Possible duplicate of [Pandas: Selecting rows based on value counts of a particular column](https://stackoverflow.com/questions/36166090/pandas-selecting-rows-based-on-value-counts-of-a-particular-column) — m0nhawk, Feb 05 '18 at 17:47

cs95 · Accepted Answer · 2018-02-05T17:51:16.550

You can use value_counts + isin -

v = df.Parameter.value_counts()
df[df.Parameter.isin(v.index[v.gt(5)])]

For example, where K = 2 (get all items which have more than 2 readings) -

df

   ID Parameter  Value
0   0         A    4.3
1   1         B    3.1
2   2         C    8.9
3   3         A    2.1
4   4         A    3.9
5   5         B    4.5

v = df.Parameter.value_counts()
v

A    3
B    2
C    1
Name: Parameter, dtype: int64

df[df.Parameter.isin(v.index[v.gt(2)])]

   ID Parameter  Value
0   0         A    4.3
3   3         A    2.1
4   4         A    3.9

score 16 · Answer 2 · answered Feb 05 '18 at 17:45

16

Use transform + size with boolean indexing:

df[df.groupby('Parameter')['Parameter'].transform('size') > 5]

answered Feb 05 '18 at 17:45

jezrael

822,522
95
1,334
1,252

score 6 · Answer 3 · answered Feb 05 '18 at 17:49

6

By using filter

df.groupby('Parameter').filter(lambda x : x['Parameter'].shape[0]>=5)

answered Feb 05 '18 at 17:49

BENY

317,841
20
164
234

Unfortuntaely really slow, but plus1 – jezrael Feb 05 '18 at 17:49

Jerrold110 · Answer 4 · 2023-06-16T04:15:07.460

3

You can use value_counts() to get the rows in a DataFrame with their original indexes where the values in for a particular column appear more than once with Series manipulation

freq = DF['attribute'].value_counts()
# index of items that appear more than once
items = freq[freq>1].index 
more_than_1_df = DF[DF['attribute'].isin(items)]
more_than_1_df

edited Jun 16 '23 at 04:15

answered Oct 18 '21 at 03:11

Jerrold110

191
1
4

score 2 · Answer 5 · answered Feb 05 '18 at 17:51

2

Loc with count could also work

df.loc[df.Parameter.isin((df.groupby('Parameter').size().Value >= 5).index)]

answered Feb 05 '18 at 17:51

Espoir Murhabazi

5,973
5
42
73

How to select rows in Pandas dataframe where value appears more than once

5 Answers5

Linked

Related