Sum of value_counts()==1 vs nunique()?

Question

Why does the following two statements return different values for a dataset? sum(dataframe['column'].value_counts()==1) vs dataframe['column'].nunique()

Note: The column under consideration is of string type

If you can do this: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples then I or others can probably easily help you. — David Erickson, Jun 09 '20 at 03:56
Thank you for the heads-up. I shall try posting some sample code in the questions next time onwards. — Shri ram, Jun 13 '20 at 12:12

Sahith Kurapati · Answer 1 · 2020-06-09T04:11:21.873

1

This is because .nunique() function also counts Nan values, whereas .value_counts() does not.

For example,

df['column'] = [1, 1, 2, 3, np.nan]
print(sum(dataframe['column'].value_counts()==1))
print(dataframe['column'].nunique())

will give output:

2
3

You can easily remove nan values by:

dataframe['column'].nunique(dropna = True)

Hope this helped you :)

edited Jun 09 '20 at 04:11

answered Jun 09 '20 at 03:57

Sahith Kurapati

1,617
10
14

Sum of value_counts()==1 vs nunique()?

1 Answers1