2

I have a data-frame which looks like:

A       B       C       D       E
a       aa      1       2       3
b       aa      4       5       6
c       cc      7       8       9
d       cc      11      10      3
e       dd      71      81      91

As rows (1,2) and rows (3,4) has duplicate values of column B. I want to keep only one of them.

The Final output should be:

A       B       C       D       E
a       aa      1       2       3
c       cc      7       8       9
e       dd      71      81      91

How can I use pandas to accomplish this?

MAC
  • 1,345
  • 2
  • 30
  • 60

3 Answers3

3

Try drop_duplicates

df = df.drop_duplicates('B')
   A   B   C   D   E
0  a  aa   1   2   3
2  c  cc   7   8   9
4  e  dd  71  81  91
BENY
  • 317,841
  • 20
  • 164
  • 234
3
DataFrame.drop_duplicates(subset="B", keep='first')

keep: keep is to control how to consider duplicate value.

  1. It has only three distinct values and the default is ‘first’.

  2. If ‘first’, it considers the first value as unique and the rest of the same values as duplicate.

  3. If ‘last’, it considers the last value as unique and the rest of the same values as duplicate. If False, it considers all of the same values as duplicates

pradeexsu
  • 1,029
  • 1
  • 10
  • 27
2

In the general case, We need to drop across multiple columns. In that case, you need to use as follow

df.drop_duplicates(subset=['A', 'C'], keep=First)

We specify the column names in the subset argument and we use the keep argument to say what we need to keep

  • first : Drop duplicates except for the first occurrence.

  • last : Drop duplicates except for the last occurrence.

  • False : Drop all duplicates.

Sivaram Rasathurai
  • 5,533
  • 3
  • 22
  • 45