Generally to count distinct values in single column, you can use Series.value_counts
:
df.domain.value_counts()
#'vk.com' 5
#'twitter.com' 2
#'facebook.com' 1
#'google.com' 1
#Name: domain, dtype: int64
To see how many unique values in a column, use Series.nunique
:
df.domain.nunique()
# 4
To get all these distinct values, you can use unique
or drop_duplicates
, the slight difference between the two functions is that unique
return a numpy.array
while drop_duplicates
returns a pandas.Series
:
df.domain.unique()
# array(["'vk.com'", "'twitter.com'", "'facebook.com'", "'google.com'"], dtype=object)
df.domain.drop_duplicates()
#0 'vk.com'
#2 'twitter.com'
#4 'facebook.com'
#6 'google.com'
#Name: domain, dtype: object
As for this specific problem, since you'd like to count distinct value with respect to another variable, besides groupby
method provided by other answers here, you can also simply drop duplicates firstly and then do value_counts()
:
import pandas as pd
df.drop_duplicates().domain.value_counts()
# 'vk.com' 3
# 'twitter.com' 2
# 'facebook.com' 1
# 'google.com' 1
# Name: domain, dtype: int64