28

I am looking for a quick way to get the t-test confidence interval in Python for the difference between means. Similar to this in R:

X1 <- rnorm(n = 10, mean = 50, sd = 10)
X2 <- rnorm(n = 200, mean = 35, sd = 14)
# the scenario is similar to my data

t_res <- t.test(X1, X2, alternative = 'two.sided', var.equal = FALSE)    
t_res

Out:

    Welch Two Sample t-test

data:  X1 and X2
t = 1.6585, df = 10.036, p-value = 0.1281
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.539749 17.355816
sample estimates:
mean of x mean of y 
 43.20514  35.79711 

Next:

>> print(c(t_res$conf.int[1], t_res$conf.int[2]))
[1] -2.539749 17.355816

I am not really finding anything similar in either statsmodels or scipy, which is strange, considering the importance of significance intervals in hypothesis testing (and how much criticism the practice of reporting only the p-values recently got).

IRTFM
  • 258,963
  • 21
  • 364
  • 487
Anarcho-Chossid
  • 2,210
  • 4
  • 27
  • 44
  • I tagged it both; maybe folks who use R know the answer for Python. Nowadays a lot of people use both. – Anarcho-Chossid Aug 02 '15 at 04:17
  • 2
    It's available in statsmodels, but doesn't have a very convenient interface http://www.statsmodels.org/stable/generated/statsmodels.stats.weightstats.CompareMeans.html – Josef Aug 02 '15 at 04:38
  • Which of those functions gives what I asked for? – Anarcho-Chossid Aug 02 '15 at 06:08
  • Quite a few SO questions give examples, please take a look to [t test](http://stackoverflow.com/questions/2324438/how-to-calculate-the-statistics-t-test-with-numpy) and [confidence interval](http://stackoverflow.com/questions/15033511/compute-a-confidence-interval-from-sample-data) – lrnzcig Aug 03 '15 at 19:55
  • I looked at quite a few SO examples, and none of them address precisely what I want to do. I need to calculate a confidence interval for a t-test of difference between means, not t-test describing my data. – Anarcho-Chossid Aug 03 '15 at 22:45
  • Also see [this answer](https://stats.stackexchange.com/a/475345/241268) for how to code it manually using `numpy`. `scipy` and `pandas`. – Warm_Duscher Aug 26 '22 at 17:30

2 Answers2

38

Here how to use StatsModels' CompareMeans to calculate the confidence interval for the difference between means:

import numpy as np, statsmodels.stats.api as sms

X1, X2 = np.arange(10,21), np.arange(20,26.5,.5)

cm = sms.CompareMeans(sms.DescrStatsW(X1), sms.DescrStatsW(X2))
print cm.tconfint_diff(usevar='unequal')

Output is

(-10.414599391793885, -5.5854006082061138)

and matches R:

> X1 <- seq(10,20)
> X2 <- seq(20,26,.5)
> t.test(X1, X2)

    Welch Two Sample t-test

data:  X1 and X2
t = -7.0391, df = 15.58, p-value = 3.247e-06
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -10.414599  -5.585401
sample estimates:
mean of x mean of y 
       15        23 
Ulrich Stern
  • 10,761
  • 5
  • 55
  • 76
  • hey @ulrich-stern, thanks for your answer. I wonder if this CI is for relative difference or absolute difference? Do you know how can we calculate CI for relative differences? – CanCeylan Mar 06 '18 at 11:33
  • @CanCeylan, my answer is for the "regular" difference. There is a [Cross Validated question](https://stats.stackexchange.com/q/264929/112208) that suggests the bootstrap in case of relative differences. – Ulrich Stern Mar 07 '18 at 15:18
0

An alternate answer using pingouin (basically copied code from here and adapted to use Ulrich Stern's variables)

import pingouin as pg
x1, x2 = np.arange(10,21), np.arange(20,26.5,.5)
res = pg.ttest(x1, x2, paired=False)
print(res)

prints

            T    dof       tail     p-val            CI95%  cohen-d       BF10  power
T-test -7.039  15.58  two-sided  0.000003  [-10.41, -5.59]    3.009  2.251e+04    1.0
Warm_Duscher
  • 686
  • 7
  • 14