dataframe remove rows with less than 5 duplicate values

Question

This is how my dataset looks:dataset sample

I am trying to remove player entries that has less than 5 years (5 entries of the same name) from the whole dataset. So in the sample snapshot, A.C. Green rows should be left untouched.

I have tried this line of code from a similar question (How can I remove rows where frequency of the value is less than 5? Python, Pandas):

n = playersData[['Player']]
playersData[n.replace(n.apply(pd.Series.value_counts)).gt(5).all(1)]

but the df.shape shows there is no decrease in rows.

Hi and welcome to Stack Overflow. First of all, I suggest you to not paste images of dataframe, look here for [how to make a good pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). Secondly, is this what you're looking for? `playersData.groupby("Player").filter(lambda x: len(x) >= 5)` — Ric S, Jul 25 '20 at 13:15

score 0 · Accepted Answer · answered Jul 25 '20 at 13:15

0

Did you try:

n = playersData[['Player']]
playerData = playersData[n.replace(n.apply(pd.Series.value_counts)).gt(5).all(1)]

answered Jul 25 '20 at 13:15

Hussein Fawzy

366
2
16

dataframe remove rows with less than 5 duplicate values

1 Answers1