-1

Given a data frame with two columns A and B:

df = 

A      B
cat    3
cat    4
cat    2
bird   1
bird   3
bird   2
bird   5
bird   3

I want to delete rows if the number of unique elements in column A is less 3 len of cat is - 3 (delete) len of bird is - 5 (keep)

desired output:

df = 

A      B
bird   1
bird   3
bird   2
bird   5
bird   3
Mamed
  • 1,102
  • 8
  • 23

2 Answers2

2

Use filter:

result = df.groupby('A').filter(lambda x: len(x) > 3)
print(result)

Output

      A  B
3  bird  1
4  bird  3
5  bird  2
6  bird  5
7  bird  3

As an alternative you could use value_counts:

# find the count by each value of A
counts = df.A.value_counts().to_frame()

# keep those with count above 3
keep = counts[counts.A > 3].index

# filter
result = df[df.A.isin(keep)]
print(result)
Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
0

This question is a duplicate of Python: Removing Rows on Count condition


I'm sure there is a better way, I just haven't quite found it yet. I will keep searching.

import pandas as pd

raw_str = \
'''
A      B
cat    3
cat    4
cat    2
bird   1
bird   3
bird   2
bird   5
bird   3'''

df_1 = pd.read_csv(StringIO(raw_str), delim_whitespace=True, header=0, dtype={'A': str, 'B': int})


val_counts = df_1['A'].value_counts()

df_1 = df_1[(val_counts[df_1['A']] > 3).reset_index(drop=True)]
AMC
  • 2,642
  • 7
  • 13
  • 35