Advanced Describe Pandas

Question

Is there a more advanced function like the describe that the pandas has? Normally i will go on like :

r = pd.DataFrame(np.random.randn(1000), columns = ['A'])
r.describe()

and i will get a nice summary.Like this one:

                A
count  1000.000000
mean      0.010230
std       0.982562
min      -2.775969
25%      -0.664840
50%       0.015452
75%       0.694440
max       3.101434

Can i find something a little more elaborate in statsmodels or scipy maybe?

I second Jeff's comment. This questing currently too vague to answer. — Paul H, May 30 '14 at 18:20
I was looking for something similar to a describe on statsmodels with sums, modes, skewness, kurtosis and maybe more. Any ideas? I think i have seen something similar on statsmodels. — Uninvited Guest, May 30 '14 at 21:01
There was a Describe function or class under development in statsmodels, but nobody has looked at it in a long time, since pandas is covering almost all of this area now. — Josef, May 31 '14 at 04:26

score 13 · Accepted Answer · answered May 31 '14 at 17:50

13

from scipy.stats import describe
describe(r, axis=0)

It will give you the size, (min,max), mean, variance, skewness, and kurtosis

answered May 31 '14 at 17:50

pbreach

16,049
27
82
120

score 9 · Answer 2 · edited Jun 20 '20 at 09:12

9

I'd rather bound to leverage the pandas library (add variance, skewness, kurtosis) than use 'external' ones, say:

    stats = df.describe()
    stats.loc['var'] = df.var().tolist()
    stats.loc['skew'] = df.skew().tolist()
    stats.loc['kurt'] = df.kurtosis().tolist()
    print(stats)

PD: pandas_profiling is amazing though

Yerart

edited Jun 20 '20 at 09:12

Community

1
1

answered Sep 08 '19 at 10:49

yerartdev

91
1
3

score 7 · Answer 3 · edited Feb 04 '23 at 17:28

7

from ydata_profiling import ProfileReport
eda = ProfileReport(df)
display(eda)

Pandas profiling is a very powerful tool which gives you almost complete EDA of your dataset starting from missing values, correlations, heat-maps and what not!

edited Feb 04 '23 at 17:28

scls

16,591
10
44
55

answered May 22 '19 at 05:51

Varun Tyagi

71
1
1

One warning when i used it in jupyter notebook it caused problem with ploting because it resets something in the display mode etc. To reset I used %matplotlib inline and it went back to normal – user96265 Nov 23 '21 at 23:08

score 1 · Answer 4 · answered Feb 06 '23 at 17:27

Found this excellent solution after much searching. It is simple and extends the existing describe() method. It adds two rows to the describe() method output, one for kurtosis and one for skew, by creating a new function describex().

Custom function to add skewness and kurtosis in descriptive stats to a pandas dataframe:

    import pandas as pd
    
    def describex(data):
        data = pd.DataFrame(data)
        stats = data.describe()
        skewness = data.skew()
        kurtosis = data.kurtosis()
        skewness_df = pd.DataFrame({'skewness':skewness}).T
        kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
        return stats.append([kurtosis_df,skewness_df])

It is similar to the previous answer, but creates a callable function.

source: https://gist.github.com/chkoar/5cb11b22b6733cbd408912b518e43a94

score 0 · Answer 5 · answered Jul 19 '23 at 22:14

UPDATE: ".append" method has been deprecated in Pandas. To use the same function with as little disruption as possible the "._append" method should be used.

HERE IS THE UPDATED CODE:

import pandas as pd

def describex(data):
    data = pd.DataFrame(data)
    stats = data.describe()
    skewness = data.skew()
    kurtosis = data.kurtosis()
    skewness_df = pd.DataFrame({'skewness':skewness}).T
    kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
    return stats._append([kurtosis_df,skewness_df])

EVERYTHING IS THE SAME EXCEPT FOR THE UNDERSCORE "_" PRECEDING THE "append" KEYWORD: "._append".

".append" vs "._append"

REFERENCE: DataFrame object has no attribute append

No, `_append` is a **private** method. It could be removed from pandas API or its behavior altered without any warning. `append` was removed for a good reason, don't recommend to use a function that could be worse... — mozway, Aug 08 '23 at 05:25

Advanced Describe Pandas

5 Answers5

Linked