Select a column in dataframe and mask the duplicate

Question

I have a dataframe like this:-

import pandas as pd

dict_data = {
    'Date':pd.Timestamp('20200720'),
    'Number': 123,
    'course':pd.Series(['Python', 'Quant', 'CFA', 'Finance', 'Python', 'Python', 'Finance', 'Finance']),
    'Company':['AA', 'BB', 'CC', 'DD', 'BB', 'BB', 'DD', 'CC']
}

pd.DataFrame(dict_data)

I can select a column. For example, dict_data['course'] and it will output all data of this column. May I know is there any method it can mask the duplicate value? Look like this?

0     Python
1      Quant
2        CFA
3    Finance

score 2 · Accepted Answer · answered Oct 20 '20 at 16:51

2

You can use df.drop_duplicates():

df = pd.DataFrame(dict_data)

In [1327]: df.course.drop_duplicates()
Out[1327]: 
0     Python
1      Quant
2        CFA
3    Finance
Name: course, dtype: object

answered Oct 20 '20 at 16:51

Mayank Porwal

33,470
8
37
58

Select a column in dataframe and mask the duplicate

1 Answers1