How to remove duplicate values using pandas and keep any one

Question

I have a data-frame which looks like:

A       B       C       D       E
a       aa      1       2       3
b       aa      4       5       6
c       cc      7       8       9
d       cc      11      10      3
e       dd      71      81      91

As rows (1,2) and rows (3,4) has duplicate values of column B. I want to keep only one of them.

The Final output should be:

A       B       C       D       E
a       aa      1       2       3
c       cc      7       8       9
e       dd      71      81      91

How can I use pandas to accomplish this?

score 3 · Answer 1 · answered Oct 03 '20 at 18:32

3

Try drop_duplicates

df = df.drop_duplicates('B')
   A   B   C   D   E
0  a  aa   1   2   3
2  c  cc   7   8   9
4  e  dd  71  81  91

answered Oct 03 '20 at 18:32

BENY

317,841
20
164
234

score 3 · Accepted Answer · answered Oct 03 '20 at 18:37

DataFrame.drop_duplicates(subset="B", keep='first')

keep: keep is to control how to consider duplicate value.

It has only three distinct values and the default is ‘first’.
If ‘first’, it considers the first value as unique and the rest of the same values as duplicate.
If ‘last’, it considers the last value as unique and the rest of the same values as duplicate. If False, it considers all of the same values as duplicates

score 2 · Answer 3 · answered Oct 03 '20 at 18:37

In the general case, We need to drop across multiple columns. In that case, you need to use as follow

df.drop_duplicates(subset=['A', 'C'], keep=First)

We specify the column names in the subset argument and we use the keep argument to say what we need to keep

first : Drop duplicates except for the first occurrence.
last : Drop duplicates except for the last occurrence.
False : Drop all duplicates.

How to remove duplicate values using pandas and keep any one

3 Answers3