-1

I am new to python and pandas. My question is related to that question: Advanced Describe Pandas

Is it possible to add some functions to reply by noobie like: geometric mean, weighted mean, harmonic mean, geometric standard deviation, etc.

import pandas as pd
    
    def describex(data):
        data = pd.DataFrame(data)
        stats = data.describe()
        skewness = data.skew()
        kurtosis = data.kurtosis()
        skewness_df = pd.DataFrame({'skewness':skewness}).T
        kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
        return stats.append([kurtosis_df,skewness_df])

So basically I am interested in adding something for example from scipy.stats that is not as these functions above originated from pandas. I want to have much more informations from descriptive statistics than standard describe offers. What I tried so far was adding more functions from pandas, and with that I am OK, but wasn't able to attach more functions that are outside of pandas. How do I do it, please ?

1 Answers1

0

There are a couple of things you could do.

One suggestion is to use the pandas-profiling library, which can generate a comprehensive report on the data including basic statistics, correlation analysis, data type analysis, missing values analysis, and more. This can be a very useful tool for quickly getting a comprehensive overview of the dataset.

Another suggestion is to use the scipy.stats library to add any advanced statistics to your custom function. The scipy.stats library probably has a function to compute any statistic you're looking for.

For example,

import pandas as pd
import numpy as np
from scipy.stats import gmean

df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

def describex(data):
    data = pd.DataFrame(data)
    stats = data.describe()
    skewness = data.skew()
    kurtosis = data.kurtosis()
    skewness_df = pd.DataFrame({'skewness':skewness}).T
    kurtosis_df = pd.DataFrame({'kurtosis':kurtosis}).T
    gmean_df = pd.DataFrame(df.apply(gmean, axis=0),columns=['gmean']).T
    return stats.append([kurtosis_df,skewness_df,gmean_df])

print(describex(df))

Hope this helps!

g.newt
  • 105
  • 3
  • thank you @g.newt, but as per your suggestion:"to use the scipy.stats library to add any advanced statistics to your custom function." - can you show me with code how to do it ? Maybe by adding geometric mean as an example. – Andrzej Andrzej Feb 15 '23 at 18:55
  • Sure. I'll edit my answer. – g.newt Feb 15 '23 at 19:51
  • I updated my answer. Is this what you're looking for? – g.newt Feb 15 '23 at 20:33
  • Yes, it is, thank you, I received that warning:"FutureWarning: The 'mad' method is deprecated and will be removed in a future version. To compute the same result, you may do `(df - df.mean()).abs().mean()`.Because I added: mad_df = pd.DataFrame({'mad':mad}).T. mad = data.mad()". Where should I place it ? – Andrzej Andrzej Feb 15 '23 at 21:00
  • That is just a warning, don't worry about it. Also, you should mark the solution correct if it answered your question. – g.newt Feb 15 '23 at 21:16
  • Just asking as new here: 1. Is it ok to ask additional questions regarding that topic when solution was already accepted here ? 2. Is it ok to present what I have done extending that solution code ? That would have been done as an my new Answer to the question as in comments characacters are limited, but still, is it ok here ? – Andrzej Andrzej Feb 16 '23 at 07:36