3

I am looking to perform a difference in means test on the summary statistics of two DataFrames.

df1[['sd']].describe()
                sd
count  5000.000000
mean      0.635558
std       0.086109
min       0.492922
25%       0.577885
50%       0.639906
75%       0.688645
max       0.800767

df2[['sd']].describe()
                sd
count  5000.000000
mean      0.640954
std       0.084459
min       0.496823
25%       0.577373
50%       0.644122
75%       0.693863
max       0.798076

I am looking for some function I can call on these summary statistics to tell me if my difference in means is statistically significant.

Lieu Zheng Hong
  • 676
  • 1
  • 10
  • 22

1 Answers1

4

If You observe two independent samples from the same or different population then perform t-test for independent samples.

This is a two-sided test for the null hypothesis that two independent samples have equal average values.

from scipy.stats import ttest_ind

ttest_ind(df1['sd'], df2['sd'])

Output will be t-statistic and the p-value.

ipj
  • 3,488
  • 1
  • 14
  • 18