9

By default describe method of Dask DataFrame summarizes only numerical columns. According to the docs I should be able to get descriptions of categorical columns by providing include parameter. However

df.describe(include=['category']).compute()

leads to a

TypeError: describe() got an unexpected keyword argument 'include'.

I tried also a little different approach:

df.select_dtypes(include=['category']).describe().compute()

and this time I get

ValueError: DataFrame contains only non-numeric data.

Could you please advise what would be the best way to summarize categorical columns in Dask DataFrame?

grześ
  • 467
  • 3
  • 21

1 Answers1

2

Summarizing only numerical or object columns

  1. To call describe() on just the numerical columns use describe(include = [np.number])
  2. To call describe() on just the objects (strings) using describe(include = ['O']).

Quote: Pandas 'describe' is not returning summary of all columns

Hugo
  • 61
  • 5