3

I have a data frame containing 3 columns and a large number of rows

     A     B     C     D     E
aa   hi    43    21    22    45
ab   helo  44    65    86    94
ac   hola  42    71    91    44
ad   hi    12    79    45    12
ae   hey   81    14    34    42
af   hi    21    45    12    02
ag   hola  04    12    39    65

I want to remove all multiple occurrences in column A, keeping the first row and eliminate the rest, so I expect a data frame as follows

     A     B     C     D     E
aa   hi    43    21    22    45
ab   helo  44    65    86    94
ac   hola  42    71    91    44
ae   hey   81    14    34    42
Biswankar Das
  • 303
  • 4
  • 12
  • Possible duplicate of [python pandas: Remove duplicates by columns A, keeping the row with the highest value in column B](https://stackoverflow.com/questions/12497402/python-pandas-remove-duplicates-by-columns-a-keeping-the-row-with-the-highest) – Zero Jul 05 '17 at 19:19
  • See https://stackoverflow.com/questions/23667369/drop-all-duplicate-rows-in-python-pandas – Zero Jul 05 '17 at 19:21

1 Answers1

3

Use drop_duplicates with parameter subset for columns for check dupes:

df = df.drop_duplicates(subset=['A'])
#same as keep='first', because default value can be omited
# df = df.drop_duplicates(subset=['A'], keep='first')
print (df)
       A   B   C   D   E
aa    hi  43  21  22  45
ab  helo  44  65  86  94
ac  hola  42  71  91  44
ae   hey  81  14  34  42

Also is possible keep only last rows:

df = df.drop_duplicates('A', keep='last')
print (df)
       A   B   C   D   E
ab  helo  44  65  86  94
ae   hey  81  14  34  42
af    hi  21  45  12   2
ag  hola   4  12  39  65
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252