0

I am using Apache Math3 to do a Student's t test for a website project. Suppose I have two samples:

double[] sampleOne = new double[] {134 ,146 ,104 ,119 ,124 ,161 ,107 ,83    ,113 ,129 ,97   ,123};
double[] sampleTwo = new double[] { 70, 118,    101,    85, 107,    132,    94};

This data is copied from https://www.statsdirect.com/help/parametric_methods/unpaired_t.htm

I hope to calculate the confidence interval as shown on the above page. For example:

Assuming equal variances 
95% confidence interval for difference between means = -2.193679 to 40.193679

I found this SO link:

Using Apache Commons Math to determine confidence intervals, which shows this method:

private double getConfidenceIntervalWidth(StatisticalSummary statistics, double significance) {
    TDistribution tDist = new TDistribution(statistics.getN() - 1);
    double a = tDist.inverseCumulativeProbability(1.0 - significance / 2);
    return a * statistics.getStandardDeviation() / Math.sqrt(statistics.getN());
}

This seems not working for two samples in t tests. I did quite research, but was not able to find how to do it with Apache Math3.

curious1
  • 14,155
  • 37
  • 130
  • 231

2 Answers2

1

I am aware that this might be a really late response, but I will try to answer your question. Assuming you have the two unpaired samples sampleOne and sampleTwo (they are unpaired since they have different sizes), you can use the following method to compute the t-statistic:

DescriptiveStatistics one = new DescriptiveStatistics();
for (double d : sampleOne)
    one.addValue(d);
DescriptiveStatistics two = new DescriptiveStatistics();
for (double d : sampleTwo)
    two.addValue(d);
double tStat = TestUtils.t(one, two);

Note that instead of DescriptiveStatistics you can use SummaryStatistics as well. Instead, if you want the p-value, you can do the following:

double pVal = TestUtils.tTest(sampleOne, sampleTwo);

Finally, if you want to run the full test with a given confidence level (let's call it double conf = 0.95), then you execute:

TestUtils.tTest(sampleOne, sampleTwo, 1.0 - conf)

Turning to getting the lower and upper margins, there is no-direct way that this is supported by Apache Commons Math. It looks like that the formula could work for an unpaired t-Test, but keep in mind that your samples' variances need to be equal (as indicated by the website you provided).

nick.katsip
  • 868
  • 3
  • 12
  • 32
1

Your idea is correct, but you need to get the right t statistic, the right standard error to multiply a by and the right degrees of freedom. If you are assuming equal variances, use

double t = tTest.homoscedasticT(sampleOne, sampleTwo);

to get the t-statistic. Then you can recover its associated standard error by dividing it into the difference between the means.

double meanDiff = StatUtils.mean(sampleOne) - StatUtils.mean(sampleTwo);
double tSigma = meanDiff / t;

Then get a T distribution instance with degrees of freedom equal to the sum of the two sample sizes minus two = 17 and do what you were attempting, only multiplying by the standard error to get the interval half-width:

TDistribution tDist = new TDistribution(df);
double a = tDist.inverseCumulativeProbability(1.0 - significance / 2);
double halfWidth = a * tSigma;

For the unequal variances case, you need to compute the approximate degrees of freedom. See the protected method df in the Commons Math TTest sources for that case. The code above gives the same results as those in your link for the equal variances case. For unequal variances, I think the reference has an error, as they appear to use 17 as degrees of freedom in the t-distribution while the statistic itself is computed using pooled variance.

Phil Steitz
  • 644
  • 3
  • 10