1

Why does the following two statements return different values for a dataset? sum(dataframe['column'].value_counts()==1) vs dataframe['column'].nunique()

Note: The column under consideration is of string type

Shri ram
  • 61
  • 5
  • If you can do this: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples then I or others can probably easily help you. – David Erickson Jun 09 '20 at 03:56
  • Thank you for the heads-up. I shall try posting some sample code in the questions next time onwards. – Shri ram Jun 13 '20 at 12:12

1 Answers1

1

This is because .nunique() function also counts Nan values, whereas .value_counts() does not.

For example,

df['column'] = [1, 1, 2, 3, np.nan]
print(sum(dataframe['column'].value_counts()==1))
print(dataframe['column'].nunique())

will give output:

2
3

You can easily remove nan values by:

dataframe['column'].nunique(dropna = True)

Hope this helped you :)

Sahith Kurapati
  • 1,617
  • 10
  • 14