0

I need to check if there are some duplicates value in one column of a dataframe using Pandas and, if there is any duplicate, delete the entire row. I need to check just the first column.

Example:

object    type

apple     fruit
ball      toy
banana    fruit
xbox      videogame
banana    fruit
apple     fruit

What i need is:

object    type

apple     fruit
ball      toy
banana    fruit
xbox      videogame

I can delete the 'object' duplicates with the following code, but I can't delete the entire row that contains the duplicate as the second column won't be deleted.


df = pd.read_csv(directory, header=None,)

objects= df[0]

for object in df[0]:
   
Fabix
  • 321
  • 1
  • 2
  • 17
  • Potential duplicate of: https://stackoverflow.com/questions/50885093/how-do-i-remove-rows-with-duplicate-values-of-columns-in-pandas-data-frame –  Jun 15 '21 at 15:47

4 Answers4

0

Select by duplicated mask and negate it

df = df[~df["object"].duplicated()]

Which gives

   object       type
0   apple      fruit
1    ball        toy
2  banana      fruit
3    xbox  videogame
crayxt
  • 2,367
  • 2
  • 12
  • 17
0

use drop_duplicates method

d = pd.DataFrame(
    {'object': ['apple', 'ball', 'banana', 'xbox', 'banana', 'apple'],
    'type': ['fruit', 'toy', 'fruit', 'videogame', 'fruit', 'fruit']}
)
d.drop_duplicates()

there are several keyword args. that might come in handy (like inplace=True if you want your dataframe d to be updated)

0

You can use .drop_duplicates() with parameter subset='object' to select the column you want to check, as follows:

df_out = df.drop_duplicates(subset='object')

Result:

print(df_out)

   object       type
0   apple      fruit
1    ball        toy
2  banana      fruit
3    xbox  videogame
SeaBean
  • 22,547
  • 3
  • 13
  • 25
0

To get the length after dropping duplicates

df = len(df)-len(df.drop_duplicates())
Derrick Kuria
  • 159
  • 1
  • 10