Pandas: unique dataframe

Question

I have a DataFrame that has duplicated rows. I'd like to get a DataFrame with a unique index and no duplicates. It's ok to discard the duplicated values. Is this possible? Would it be a done by groupby?

score 85 · Accepted Answer · answered Sep 07 '12 at 18:37

85

In [29]: df.drop_duplicates()
Out[29]: 
   b  c
1  2  3
3  4  0
7  5  9

answered Sep 07 '12 at 18:37

Wouter Overmeire

65,766
10
63
43

It's worthwhile to note this takes either the first or last occurrence. So you need to sort by some other quantity first (if you're lucky) or do some complicated groupby logic anyway. – ely Sep 08 '12 at 02:20
2

This is wrong. drop_duplicates acts on the values only (at least in my version). You need to reset_index if you want to drop on index and values or just work with the index if you want to have a unique index. Maybe there is another way besides groupby to enforce unique index? – mathtick Jul 11 '13 at 14:02
1

Use `df.drop_duplicates(inplace=True)` if you don't want to assign a new variable. – Flavian Hautbois Mar 23 '15 at 11:22
this does not give a dataframe with unique index, the solution by @Adam Greenhall below, however works for that – dashesy Apr 12 '15 at 18:21

Adam Greenhall · Answer 2 · 2012-09-07T20:17:54.793

11

Figured out one way to do it by reading the split-apply-combine documentation examples.

df = pandas.DataFrame({'b':[2,2,4,5], 'c': [3,3,0,9]}, index=[1,1,3,7])
df_unique = df.groupby(level=0).first()

df
   b  c
1  2  3
1  2  3
3  4  0
7  5  9

df_unique
   b  c
1  2  3
3  4  0
7  5  9

edited Sep 07 '12 at 20:17

answered Sep 07 '12 at 17:38

Adam Greenhall

4,818
6
30
31

This relies on the row index being duplicated for rows where the data fields (b,c) are duplicated, effectively making the index part of your row as vector that you want to be unique (not duplicated). – hobs Nov 01 '12 at 20:32
4

If you have duplicated index entries, this is the answer you want. – rogueleaderr Jun 04 '14 at 00:59
I was getting `ValueError: Index contains duplicate entries, cannot reshape` when doing `unstack` on a MultIndex but this solution works for that only I had to do `df_unique = df.groupby(level=[0,1]).first()` – dashesy Apr 12 '15 at 18:19

Pandas: unique dataframe

2 Answers2

Linked