21

I want to extract all unique combinations of values of columns Col1, Col2 and Col3. Let's say there is the following dataframe df:

df =

Col1    Col2    Col3
12      AB      13
11      AB      13
12      AB      13
12      AC      14

The answer is:

unique =

Col1    Col2    Col3
12      AB      13
11      AB      13
12      AC      14

I know how to obtain unique values of a particular column, i.e. df.Col1.unique(), however not sure about unique combinations.

Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
Dinosaurius
  • 8,306
  • 19
  • 64
  • 113
  • 1
    Does this answer your question? [Finding unique combinations of columns from a dataframe](https://stackoverflow.com/questions/49110156/finding-unique-combinations-of-columns-from-a-dataframe) – toliveira Sep 17 '20 at 11:54

2 Answers2

28

There is a method for this - pandas.DataFrame.drop_duplicates:

>>> df.drop_duplicates()
   Col1 Col2  Col3
0    12   AB    13
1    11   AB    13
3    12   AC    14

You can do it inplace as well:

>>> df.drop_duplicates(inplace=True)
>>> df
   Col1 Col2  Col3
0    12   AB    13
1    11   AB    13
3    12   AC    14

If you need to get unique values of certain columns:

>>> df[['Col2','Col3']].drop_duplicates()
  Col2  Col3
0   AB    13
3   AC    14

as @jezrael suggests, you can also consider using subset parameter of drop_duplicates():

>>> df.drop_duplicates(subset=['Col2','Col3'])
   Col1 Col2  Col3
0    12   AB    13
3    12   AC    14
Julien Marrec
  • 11,605
  • 4
  • 46
  • 63
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
  • What if I can much more columns? I only need to get unique values of three columns. Should it be something like this?: `df[["Col1","Col2","Col3"]].drop_duplicates()` – Dinosaurius Dec 24 '16 at 10:54
  • 1
    @RomanPekar - I think OP wants `print (df.drop_duplicates(subset=['Col1','Col2','Col3']))` – jezrael Dec 24 '16 at 11:02
  • @jezrael I think he needs to get only unique values of certain columns, but I'll add your suggestion to the answer, thanks – Roman Pekar Dec 24 '16 at 11:08
  • @RomanPekar - Yes, but if there is multiple columns and need only filter by 3 columns, need subset. But it depends what OP exactly need. – jezrael Dec 24 '16 at 11:09
0

You could also use a unique() method on the index:

index = ['Col2','Col3']
df.set_index(index).index.unique().to_frame(False)

And you might be interested in knowing the number of time each combination is repeated using value_count as explained in this answer

df[index].value_counts()
Paul Rougieux
  • 10,289
  • 4
  • 68
  • 110