Pandas DataFrame drop duplicates keeping first 'x' occurences

Question

What I am looking for is a function that works exactly like pandas.DataFrame.drop_duplicates() but that allows me to keep not only the first occurence but the first 'x' occurences (say like 10). Does anything like that exist? Thanks for your help!

sacuL · Accepted Answer · 2019-02-18T23:40:26.100

IIUC, One way to do this would be with a groupby and head, to select the first x occurrences. As noted in the docs, head:

Returns first n rows of each group.

Sample code:

x = 10
df.groupby('col').head(x)

Where col is the column you want to check for duplicates, and x is the number of occurrences you want to keep for each value in col

For instance:

In [81]: df.head()
Out[81]:
   a         b
0  3  0.912355
1  3  2.091888
2  3 -0.422637
3  1 -0.293578
4  2 -0.817454
....

# keep 3 first instances of each value in column a:

x = 3
df.groupby('a').head(x)

Out[82]:
   a         b
0  3  0.912355
1  3  2.091888
2  3 -0.422637
3  1 -0.293578
4  2 -0.817454
5  1  1.476599
6  1  0.898684
8  2 -0.824963
9  2 -0.290499

Yes, that's exactly what I was looking for. It perfectly solves the problem. Thanks! — Fulvio, Feb 19 '19 at 02:40

Pandas DataFrame drop duplicates keeping first 'x' occurences

1 Answers1