Is there a way to find out if the data is normally distributed using Python library?

Question

I have a dataset of more than 100 columns, I want to find out if the data of the column is normally distributed or not? if not then i have to make it normally distributed, i am curious if there is a way where i can find this out logically, finding it out manually is tiresome and confusing. I tried this but my logic is failing

def find_normal_dist(_colname):
    column_name=_colname
    f_f1=zscore(x[column_name])<=1
    f_f2=zscore(x[column_name])>=-1

    s_f3=zscore(x[column_name])>1
    s_f4=zscore(x[column_name])<2

    s_f5=zscore(x[column_name])>-2
    s_f6=zscore(x[column_name])<-1

    t_f3=zscore(x[column_name])>2
    t_f4=zscore(x[column_name])<3

    t_f5=zscore(x[column_name])>-3
    t_f6=zscore(x[column_name])<-2
    
    std_2_p=len(x[column_name][s_f3 & s_f4])
    std_2_n=len(x[column_name][s_f5 & s_f6])

    std_3_p=len(x[column_name][t_f3 & t_f4])
    std_3_n=len(x[column_name][t_f5 & t_f6])
    
    one_std_dev=(len(x[column_name][f_f1 & f_f2])/len(x))*100
    two_std_dev=(std_2_p+std_2_n)/len(x)
    three_std_dev=(std_3_p+std_3_n)/len(x)
    return '1 {} 2 {} 3 {}'.format(round(one_std_dev),round(two_std_dev),round(three_std_dev)

i am using kaggle dataset

score 2 · Answer 1 · answered Dec 13 '21 at 13:40

2

Investigate a QQ plot, or run Shapiro-Wilk to test if the data are normal.

answered Dec 13 '21 at 13:40

Alex Reynolds

95,983
54
240
345

Is there a way to find out if the data is normally distributed using Python library?

1 Answers1