Delete rows based on unique number of elements in a column pandas

Question

Given a data frame with two columns A and B:

df = 

A      B
cat    3
cat    4
cat    2
bird   1
bird   3
bird   2
bird   5
bird   3

I want to delete rows if the number of unique elements in column A is less 3 len of cat is - 3 (delete) len of bird is - 5 (keep)

desired output:

df = 

A      B
bird   1
bird   3
bird   2
bird   5
bird   3

This is a duplicate: https://stackoverflow.com/q/49735683/11301900. — AMC, Dec 01 '19 at 02:42

score 2 · Accepted Answer · answered Dec 01 '19 at 01:41

Use filter:

result = df.groupby('A').filter(lambda x: len(x) > 3)
print(result)

Output

      A  B
3  bird  1
4  bird  3
5  bird  2
6  bird  5
7  bird  3

As an alternative you could use value_counts:

# find the count by each value of A
counts = df.A.value_counts().to_frame()

# keep those with count above 3
keep = counts[counts.A > 3].index

# filter
result = df[df.A.isin(keep)]
print(result)

AMC · Answer 2 · 2019-12-01T02:47:34.120

This question is a duplicate of Python: Removing Rows on Count condition

I'm sure there is a better way, I just haven't quite found it yet. I will keep searching.

import pandas as pd

raw_str = \
'''
A      B
cat    3
cat    4
cat    2
bird   1
bird   3
bird   2
bird   5
bird   3'''

df_1 = pd.read_csv(StringIO(raw_str), delim_whitespace=True, header=0, dtype={'A': str, 'B': int})


val_counts = df_1['A'].value_counts()

df_1 = df_1[(val_counts[df_1['A']] > 3).reset_index(drop=True)]

Delete rows based on unique number of elements in a column pandas

2 Answers2