Why does the following two statements return different values for a dataset? sum(dataframe['column'].value_counts()==1) vs dataframe['column'].nunique()
Note: The column under consideration is of string type
Why does the following two statements return different values for a dataset? sum(dataframe['column'].value_counts()==1) vs dataframe['column'].nunique()
Note: The column under consideration is of string type
This is because .nunique()
function also counts Nan
values, whereas .value_counts()
does not.
For example,
df['column'] = [1, 1, 2, 3, np.nan]
print(sum(dataframe['column'].value_counts()==1))
print(dataframe['column'].nunique())
will give output:
2
3
You can easily remove nan
values by:
dataframe['column'].nunique(dropna = True)
Hope this helped you :)