Case 1: all datatypes are numeric:
df.describe()
works fine after df.assign(..)
for numeric datatypes, here's a reproducible example:
>>> df = pd.DataFrame([[1,2],[3,4]], columns=list('AB'))
>>> df
A B
0 1 2
1 3 4
>>> import numpy as np
>>> df["C"] = np.nan
>>> df
A B C
0 1 2 NaN
1 3 4 NaN
>>> df.describe()
A B C
count 2.000000 2.000000 0.0
mean 2.000000 3.000000 NaN
std 1.414214 1.414214 NaN
min 1.000000 2.000000 NaN
25% 1.500000 2.500000 NaN
50% 2.000000 3.000000 NaN
75% 2.500000 3.500000 NaN
max 3.000000 4.000000 NaN
>>> df.assign(D=5)
A B C D
0 1 2 NaN 5
1 3 4 NaN 5
>>> df.describe()
A B C
count 2.000000 2.000000 0.0
mean 2.000000 3.000000 NaN
std 1.414214 1.414214 NaN
min 1.000000 2.000000 NaN
25% 1.500000 2.500000 NaN
50% 2.000000 3.000000 NaN
75% 2.500000 3.500000 NaN
max 3.000000 4.000000 NaN
>>> df = df.assign(D=5)
>>> df.describe()
A B C D
count 2.000000 2.000000 0.0 2.0
mean 2.000000 3.000000 NaN 5.0
std 1.414214 1.414214 NaN 0.0
min 1.000000 2.000000 NaN 5.0
25% 1.500000 2.500000 NaN 5.0
50% 2.000000 3.000000 NaN 5.0
75% 2.500000 3.500000 NaN 5.0
max 3.000000 4.000000 NaN 5.0
>>>
- Make sure you assign the result of df.assign back to df like df= df.assign(...)
Case 2: mixed numeric and object datatypes:
For mixed object and numeric datatypes, you need to do df.describe(include='all')
as mentioned in the Notes section from the documentation here:
For mixed data types provided via a DataFrame, the default is to
return only an analysis of numeric columns. If include='all' is
provided as an option, the result will include a union of attributes
of each type.
>>> df["E"] = ['1','2']
>>> df
A B C D E
0 1 2 NaN 5 1
1 3 4 NaN 5 2
>>> df.describe()
A B C D
count 2.000000 2.000000 0.0 2.0
mean 2.000000 3.000000 NaN 5.0
std 1.414214 1.414214 NaN 0.0
min 1.000000 2.000000 NaN 5.0
25% 1.500000 2.500000 NaN 5.0
50% 2.000000 3.000000 NaN 5.0
75% 2.500000 3.500000 NaN 5.0
max 3.000000 4.000000 NaN 5.0
>>> df
A B C D E
0 1 2 NaN 5 1
1 3 4 NaN 5 2
>>>
so you need to call describe as follows:
>>> df.describe(include='all')
A B C D E
count 2.000000 2.000000 0.0 2.0 2
unique NaN NaN NaN NaN 2
top NaN NaN NaN NaN 2
freq NaN NaN NaN NaN 1
mean 2.000000 3.000000 NaN 5.0 NaN
std 1.414214 1.414214 NaN 0.0 NaN
min 1.000000 2.000000 NaN 5.0 NaN
25% 1.500000 2.500000 NaN 5.0 NaN
50% 2.000000 3.000000 NaN 5.0 NaN
75% 2.500000 3.500000 NaN 5.0 NaN
max 3.000000 4.000000 NaN 5.0 NaN
>>>