I have a dataset of more than 100 columns, I want to find out if the data of the column is normally distributed or not? if not then i have to make it normally distributed, i am curious if there is a way where i can find this out logically, finding it out manually is tiresome and confusing. I tried this but my logic is failing
def find_normal_dist(_colname):
column_name=_colname
f_f1=zscore(x[column_name])<=1
f_f2=zscore(x[column_name])>=-1
s_f3=zscore(x[column_name])>1
s_f4=zscore(x[column_name])<2
s_f5=zscore(x[column_name])>-2
s_f6=zscore(x[column_name])<-1
t_f3=zscore(x[column_name])>2
t_f4=zscore(x[column_name])<3
t_f5=zscore(x[column_name])>-3
t_f6=zscore(x[column_name])<-2
std_2_p=len(x[column_name][s_f3 & s_f4])
std_2_n=len(x[column_name][s_f5 & s_f6])
std_3_p=len(x[column_name][t_f3 & t_f4])
std_3_n=len(x[column_name][t_f5 & t_f6])
one_std_dev=(len(x[column_name][f_f1 & f_f2])/len(x))*100
two_std_dev=(std_2_p+std_2_n)/len(x)
three_std_dev=(std_3_p+std_3_n)/len(x)
return '1 {} 2 {} 3 {}'.format(round(one_std_dev),round(two_std_dev),round(three_std_dev)
i am using kaggle dataset