0

Let's say I have a dataframe:

a = [1,1,2,3,4]
b = [1,1,6,7,8]
c = [2,9,3,4,5]
ab  = pd.DataFrame(zip(a,b,c), columns = {'col1', 'col2', 'col3'})
ab
   col2  col3  col1
0     1     1     2
1     1     1     9
2     2     6     3
3     3     7     4
4     4     8     5

And let's say I wanted to get unique rows across n columns (in this case col2 and col3, but would love a general n example). but keep all columns in the dataframe and only omit the duplicate as shown below.

   col2  col3  col1
0     1     1     2
2     2     6     3
3     3     7     4
4     4     8     5

What would be the best way to do this?

This is a similar question to Subset with unique cases, based on multiple columns but only in Python

ben890
  • 1,097
  • 5
  • 25
  • 56

1 Answers1

1

You could write a function for more generality:

def drop_dupes(df, cols):
    return df[~df[cols].duplicated(keep='first')]

print(drop_dupes(df, ['col2', 'col3'])
   col2  col3  col1
0     1     1     2
2     2     6     3
3     3     7     4
4     4     8     5
manwithfewneeds
  • 1,137
  • 1
  • 7
  • 10