71

Could you please suggest any simple Java statistics packages?

I don't necessarily need any of the advanced stuff. I was quite surprised that there does not appear to be a function to calculate the Mean in the java.lang.Math package...

What are you guys using for this?


EDIT

Regarding:

How hard is it to write a simple class that calculates means and standard deviations?

Well, not hard. I only asked this question after having hand-coded these. But it only added to my Java frustration not to have these simplest functions available at hand when I needed them. I don't remember the formula for calculating stdev by heart :)

Peter Perháč
  • 20,434
  • 21
  • 120
  • 152
  • 8
    You shouldn't have to remember it by heart. Any language worth its salt should make it straightforward to access basic descriptive statistics. Seeing people ask "how hard it is to write a standard deviation" function is funny... – Adam Hughes Dec 28 '15 at 20:17

5 Answers5

91

Apache Commons Math, specifically DescriptiveStatistics and SummaryStatistics.

sladstaetter
  • 453
  • 4
  • 11
John Paulett
  • 15,596
  • 4
  • 45
  • 38
  • thanks. I take it there isn't really a reason for looking any further. You're satisfied with Apache Commons, or is it just so-so, good-enough, could-be-better? – Peter Perháč Nov 14 '09 at 22:50
  • Just discovered this library, precisely for calculating mean, standard deviation. Very easy to pick up. +1 – Grundlefleck Nov 14 '09 at 23:02
  • I've found it to fit my needs well. While I've never personally run into this issue or cared, I have a coworker who found it to be slower when computing the mean of an array than just doing a loop to add the values then dividing by the size of the array. However, his code was averaging things that would likely never cause integer overflow errors. I assume that Commons Math is a little smarter and won't let integers overflow. – John Paulett Nov 14 '09 at 23:04
  • 2
    The APIs use `double` not `int` or `long` so integer overflow is not an issue. However, they cannot handle value sets with more than `Integer.MAX_VALUE` doubles. – Stephen C Nov 14 '09 at 23:27
  • You should not have to store an array to calculate a mean or standard deviation. It's easy to do both without having to take up all that memory. – duffymo Nov 15 '09 at 00:42
  • How hard is it to write a simple class that calculates means and standard deviations? Must there be a library for everything? – duffymo Nov 15 '09 at 00:42
  • @duffymo, the original data was in an array, so it was just keeping a running total and then dividing by the size of the array. – John Paulett Nov 15 '09 at 03:39
  • Yes, I realize that. All I'm saying is that an array isn't necessary. It's not even desirable if you're trying to minimize the amount of memory you consume. – duffymo Nov 15 '09 at 04:10
  • @duffymo As a classic Java programmer, I am definitely not concerned by stuff like how much memory do my programs consume. (<-- joking, of course) As to `How hard is it to write a simple class that calculates means and standard deviations?` well, not hard. I only asked this question *after* having hand-coded these. But it only added to my Java frustration not to have these simplest functions available at my hand when I needed them. I don't remember the formula for calculating stdev by heart :) – Peter Perháč Nov 15 '09 at 10:36
  • @duffymo - my reading of the Apache library APIs is that they require you to pass the values to be averaged in an array. – Stephen C Nov 15 '09 at 10:45
  • 1
    @MasterPeter - but I'm sure you remember the URL for Wikipedia by heart :-) :-) – Stephen C Nov 15 '09 at 10:46
  • @duffymo, while I also would have found it easy to write the functions I used in commons-math, stupid mistakes can be made by anyone, at any time. Sometimes I prefer not to leave it to chance. Also, in some cases it's preferable to up the memory footprint in exchange for a tested solution. All depends on the situation I guess... – Grundlefleck Nov 15 '09 at 11:23
  • @Grundlefleck - I agree that everyone makes stupid mistakes, and I realize the value of libraries, but a simple mean and standard deviation calculator are low on the risk scale. It's easy to write, easy to test, and put aside. There's an argument that says minimizing dependencies is a good idea, too. Why add another library to your app when it's so easy to roll your own? – duffymo Nov 15 '09 at 16:21
  • @Stephen C - agreed. I'm saying that's fine when you have a reasonable number of values, but as the array size grows you'll have a problem storing them. What do you do in the case of a runtime app that you want to keep a running tab on mean and standard deviation of values as they arrive? Your array won't be very useful in that situation. – duffymo Nov 15 '09 at 17:53
  • 14
    Just to save people a few clicks: DescriptiveStatistics is the one that holds all of the values you send it in memory, and SummaryStatistics does not hold them in memory. – Michael Rusch Sep 08 '11 at 20:45
25

Since Java SE 8 a number of classes has been added to the platform:

Peter Perháč
  • 20,434
  • 21
  • 120
  • 152
18

Just responding to this part of the question:

I was quite surprised that there does not appear to be a function to calculate the Mean in the java.lang.Math package...

I don't think I was surprised to find this. There are a lot of "useful algorithms" that the Java class libraries do not implement. They do not implement everything. And in this, they are no different from other programming languages.

Actually It would be a bad thing if Sun did try to implement too much in J2SE:

  1. It would take more designer / developer / technical documenter time ... with no clear "return on investment".

  2. It would increase the Java footprint; e.g. the size of "rt.jar". (Or if they tried to mitigate that, it would result in more platform complexity ... )

  3. For things in the mathematical space, you often need to implement the algorithms in different ways (with different APIs) to cater for different requirements.

  4. For complex things, it may be better for Sun not to try to "standardise" the APIs, but leave it to some other interested / skilled group to do it; e.g. the Apache folks.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
5
import java.util.*;
public class stdevClass {
    public static void main(String[] args){
        int [] list = {1,-2,4,-4,9,-6,16,-8,25,-10};
        double stdev_Result = stdev(list);
        System.out.println(stdev(list));
    }

    public static double stdev(int[] list){
        double sum = 0.0;
        double mean = 0.0;
        double num=0.0;
        double numi = 0.0;
        double deno = 0.0;

        for (int i : list) {
            sum+=i;
        }
        mean = sum/list.length;

        for (int i : list) {
            numi = Math.pow((double) i - mean), 2);
            num+=numi;
        }

        return Math.sqrt(num/list.length);
    }
}
Belphegor
  • 4,456
  • 11
  • 34
  • 59
hodan_egal
  • 67
  • 1
  • 1
  • This does not avoid overflow or numerical instability. It is much better to use a library that take care of these things for you. – Imran Mar 07 '23 at 18:21
1

I think there is no direct method and classes in java. We have to build it for our own. For your requirement this code will help you. Calculate Standard Deviation in java

user889392
  • 19
  • 1
  • 1
    Answered two years after John Paulett's answer. Clearly there are libraries available to do it. I wouldnt implement your own unless for educational purposes. – cowls Jun 24 '15 at 14:22