Grab Distinct Rows across n columns, but keep all columns in dataframe

Question

Let's say I have a dataframe:

a = [1,1,2,3,4]
b = [1,1,6,7,8]
c = [2,9,3,4,5]
ab  = pd.DataFrame(zip(a,b,c), columns = {'col1', 'col2', 'col3'})
ab
   col2  col3  col1
0     1     1     2
1     1     1     9
2     2     6     3
3     3     7     4
4     4     8     5

And let's say I wanted to get unique rows across n columns (in this case col2 and col3, but would love a general n example). but keep all columns in the dataframe and only omit the duplicate as shown below.

   col2  col3  col1
0     1     1     2
2     2     6     3
3     3     7     4
4     4     8     5

What would be the best way to do this?

This is a similar question to Subset with unique cases, based on multiple columns but only in Python

score 1 · Accepted Answer · answered Oct 18 '19 at 00:49

1

You could write a function for more generality:

def drop_dupes(df, cols):
    return df[~df[cols].duplicated(keep='first')]

print(drop_dupes(df, ['col2', 'col3'])
   col2  col3  col1
0     1     1     2
2     2     6     3
3     3     7     4
4     4     8     5

answered Oct 18 '19 at 00:49

manwithfewneeds

1,137
1
7
10

`drop_duplicates()` is also an option. – Quang Hoang Oct 18 '19 at 00:50

Grab Distinct Rows across n columns, but keep all columns in dataframe

1 Answers1