Is there an easy way to eliminate duplicate rows in a DataFrame in Python- pandas?

Question

My problem is that my data isn't a good representation of what is really going on because it has a lot of duplicate rows. Consider the following-

I only want 1 row and to eliminate all duplicates. It should look like the following after it's done.

    a    b
1  23   42
2  14   12

Is there a function to do this?

Scott Boston · Answer 1 · 2017-06-12T20:18:50.097

7

Let's use drop_duplicates with keep='first':

df2.drop_duplicates(keep='first')

Output:

    a   b
1  23  42
4  14  12

edited Jun 12 '17 at 20:18

answered Jun 12 '17 at 20:09

Scott Boston

Great answer and I'm sure it works... but `ValueError: Buffer has wrong number of dimensions (expected 1, got 2)`. – Ravaal Jun 12 '17 at 20:20
Can you print the head of your actual data? – Scott Boston Jun 12 '17 at 20:24
I can't. It's a security risk. – Ravaal Jun 12 '17 at 20:29
Multiindex data? See if you can recreate dummy data that reproduces this error. – Scott Boston Jun 12 '17 at 20:30
@NickTheInventor df2.drop_duplicates(subset=['a','b']) You could try something like this if you don't want to conder all the columns. – Scott Boston Jun 12 '17 at 21:46

1 Answers1