15

Possible Duplicate:
Simple statistics - Java packages for calculating mean, standard deviation, etc

I have a vector of some doubles (1.1,2,3,5). How can I calculate the variance, median, and standard deviation?

Java or C++ or even pseudo code would do.

Community
  • 1
  • 1
124697
  • 22,097
  • 68
  • 188
  • 315
  • [Already answered for Java][1] [1]: http://stackoverflow.com/questions/1735870/simple-statistics-java-packages-for-calculating-mean-standard-deviation-etc – Fernando Miguélez Nov 02 '11 at 23:18
  • You should always demonstrate your _bona fide_ effort at answering the question yourself. This is particularly true for `[homework]` questions! Please try harder! – mjv Nov 02 '11 at 23:23
  • [Variance in C++](http://stackoverflow.com/questions/1721980/calculating-variance-with-large-numbers/1723071). If memory serves, the standard deviation is the square root of the variance. `std::nth_element` can find the median. For the mean use `std::accumulate` and `whatever.size()` – Jerry Coffin Nov 02 '11 at 23:25
  • 3
    this shouldn't be marked as a duplicate. This question asks for code. The "duplicate" question asks for a library... – Krimson Mar 25 '15 at 08:20
  • Those algorithms can be implemented easily based on their definitions: - [Variance](http://en.wikipedia.org/wiki/Variance) - [Median](http://en.wikipedia.org/wiki/Median) - [Mean](http://en.wikipedia.org/wiki/Mean) - [Standard deviation](http://en.wikipedia.org/wiki/Standard_deviation) Perhaps I simply don't understand the question, but you should be fine if you follow the directions in those articles. – Cam Nov 02 '11 at 23:18

2 Answers2

119
public class Statistics {
    double[] data;
    int size;   

    public Statistics(double[] data) {
        this.data = data;
        size = data.length;
    }   

    double getMean() {
        double sum = 0.0;
        for(double a : data)
            sum += a;
        return sum/size;
    }

    double getVariance() {
        double mean = getMean();
        double temp = 0;
        for(double a :data)
            temp += (a-mean)*(a-mean);
        return temp/(size-1);
    }

    double getStdDev() {
        return Math.sqrt(getVariance());
    }

    public double median() {
       Arrays.sort(data);
       if (data.length % 2 == 0)
          return (data[(data.length / 2) - 1] + data[data.length / 2]) / 2.0;
       return data[data.length / 2];
    }
}
Dennis
  • 3,962
  • 7
  • 26
  • 44
  • 1
    why does median not work on data like the rest? – Mooing Duck Nov 02 '11 at 23:33
  • Feel free to edit, I snatched median from a website. – Dennis Nov 02 '11 at 23:33
  • 11
    @JW8: No it's not a nice implementation example. Why is `size` of type `double`? The implementation of `median()` is also horrible. He copies and sorts the whole data-structure just to find the median :/. You should use a selection algorithm such as quickselect – Moncef M. Sep 09 '13 at 21:34
  • 4
    @fireboot Feel free to provide said implementation. I'm trying to keep it simple. – Dennis Jul 17 '14 at 17:25
  • 1
    One problem with this implementation is that if you want all three, you will make three passes on the data. – static_rtti Jul 24 '15 at 08:03
  • You have to pick and choose your battles; for this case I focused on modularity. – Dennis Jul 24 '15 at 23:52
  • 1
    Population mean - worth pointing out that sample mean may be more appropriate, suggest you read about Bessel correction. – ChuckCottrill Nov 04 '16 at 04:34
  • 2
    I tried this and the results I get do not match WolframAlpha output. Mean is correct, but variance is not. I used `data.length - 1` instead of `size` when calculating and got correct results https://pastebin.com/AHA4zkyT – Asalas77 Jun 07 '17 at 22:29
  • 1
    @Asalas77 is correct, variance is sum of squares of distances from mean divided by number of values - 1. OP should really update this answer. Nvm I have a high enough score I just edited the answer myself. – mclaassen Jun 12 '17 at 19:14
  • 1
    Why for the mean you use `size` but for the variance you use `size - 1`? @Asalas77 `values - 1`? Where is that formula? It is always divided by `n` not `n - 1`. – PedroD Sep 04 '17 at 15:17
  • @PedroD that's just the formula, it has `- 1` in it. http://www.statisticshowto.com/find-variance-minitab/ – Asalas77 Sep 04 '17 at 22:50
  • @Asalas77 strange, depending on where you look you find versions with `n` and `n - 1`, eg: https://www.easycalculation.com/statistics/learn-sample-variance.php – PedroD Sep 04 '17 at 22:56
  • I thought the naive algorithm accumulates floating point errors – Phlip Jan 22 '20 at 01:48
0

To calculate the mean, loop through the list/array of numbers, keeping track of the partial sums and the length. Then return the sum/length.

double sum = 0.0;
int length = 0;

for( double number : numbers ) {
    sum += number;
    length++;
}

return sum/length;

Variance is calculated similarly. Standard deviation is simply the square root of the variance:

double stddev = Math.sqrt( variance );
tskuzzy
  • 35,812
  • 14
  • 73
  • 140