12

I wish to clarify upfront I am looking for a way to calculate Standard deviation using Streams (I have a working method at present which calculates & returns SD but without using Streams).

The dataset i am working with matches closely as seen in Link. As shown in this link am able to group my data & get the average but not able to figure out how to get the SD.

Code

outPut.stream()
            .collect(Collectors.groupingBy(e -> e.getCar(),
                    Collectors.averagingDouble(e -> (e.getHigh() - e.getLow()))))
            .forEach((car,avgHLDifference) -> System.out.println(car+ "\t" + avgHLDifference));

I also checked Link on DoubleSummaryStatistics but it doesn't seem to help for SD.

Community
  • 1
  • 1
iCoder
  • 1,406
  • 6
  • 16
  • 35

2 Answers2

18

You can use a custom collector for this task that calculates a sum of square. The buit-in DoubleSummaryStatistics collector does not keep track of it. This was discussed by the expert group in this thread but finally not implemented. The difficulty when calculating the sum of squares is the potential overflow when squaring the intermediate results.

static class DoubleStatistics extends DoubleSummaryStatistics {

    private double sumOfSquare = 0.0d;
    private double sumOfSquareCompensation; // Low order bits of sum
    private double simpleSumOfSquare; // Used to compute right sum for non-finite inputs

    @Override
    public void accept(double value) {
        super.accept(value);
        double squareValue = value * value;
        simpleSumOfSquare += squareValue;
        sumOfSquareWithCompensation(squareValue);
    }

    public DoubleStatistics combine(DoubleStatistics other) {
        super.combine(other);
        simpleSumOfSquare += other.simpleSumOfSquare;
        sumOfSquareWithCompensation(other.sumOfSquare);
        sumOfSquareWithCompensation(other.sumOfSquareCompensation);
        return this;
    }

    private void sumOfSquareWithCompensation(double value) {
        double tmp = value - sumOfSquareCompensation;
        double velvel = sumOfSquare + tmp; // Little wolf of rounding error
        sumOfSquareCompensation = (velvel - sumOfSquare) - tmp;
        sumOfSquare = velvel;
    }

    public double getSumOfSquare() {
        double tmp =  sumOfSquare + sumOfSquareCompensation;
        if (Double.isNaN(tmp) && Double.isInfinite(simpleSumOfSquare)) {
            return simpleSumOfSquare;
        }
        return tmp;
    }

    public final double getStandardDeviation() {
        return getCount() > 0 ? Math.sqrt((getSumOfSquare() / getCount()) - Math.pow(getAverage(), 2)) : 0.0d;
    }

}

Then, you can use this class with

Map<String, Double> standardDeviationMap =
    list.stream()
        .collect(Collectors.groupingBy(
            e -> e.getCar(),
            Collectors.mapping(
                e -> e.getHigh() - e.getLow(),
                Collector.of(
                    DoubleStatistics::new,
                    DoubleStatistics::accept,
                    DoubleStatistics::combine,
                    d -> d.getStandardDeviation()
                )
            )
        ));

This will collect the input list into a map where the values corresponds to the standard deviation of high - low for the same key.

Tunaki
  • 132,869
  • 46
  • 340
  • 423
  • thank you very much. I am able to get the SD. I am now checking to see if I can Collect both averagingDouble & SD (like - car, averageHL, SD) in the same stream() call instead of 2 streams. – iCoder Mar 28 '16 at 14:56
  • 1
    @iCoder The `DoubleStatistics` in this answer collect SD and the average yes. You could have a `Map` with all the info. – Tunaki Mar 28 '16 at 14:57
  • 3
    The interesting fact regarding overflow: nobody cares that `LongSummaryStatistics` actually overflows the sum, so `LongStream.of(Long.MAX_VALUE, Long.MAX_VALUE).summaryStatistics().getAverage()` is `-1.0`. Chances to hit this overflow, to my opinion are higher than chances to hit sum-of-squares overflow... – Tagir Valeev Mar 28 '16 at 15:16
  • 1
    @Tunaki not quite sure what mistake I am doing but the moment I change Map to Map I am getting a error message cannot resolve getScrip() in the groupingBy. I guess am making some elementary mistake, need to ponder more I suppose. – iCoder Mar 28 '16 at 16:34
  • In Java 1.8.0_92, the example use throws the error: Type mismatch: cannot convert from Map to Map – simpleuser Jul 17 '17 at 21:21
  • they could also just have used bigdecimal. I really do not understand why java is not learning from scala, python and R ... the whole collections implementation lacks of so many things .... – KIC Oct 16 '18 at 07:46
  • This answer could as well just compute the square of standard deviation incrementally without overflowing anything. `stddev = (stddev * prevcount + newsquare) / count` not sure why this method wasn't accepted by the expert group. – nurettin Mar 28 '19 at 10:19
  • What if I just want to return the std of a list of numbers? Specifying `.collect(Collector.of(...)...)` does not work ("the method collect is not applicable for the arguments..") – Corel Nov 01 '21 at 21:42
5

You can use this custom Collector :

private static final Collector<Double, double[], Double> VARIANCE_COLLECTOR = Collector.of( // See https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance
        () -> new double[3], // {count, mean, M2}
        (acu, d) -> { // See chapter about Welford's online algorithm and https://math.stackexchange.com/questions/198336/how-to-calculate-standard-deviation-with-streaming-inputs
            acu[0]++; // Count
            double delta = d - acu[1];
            acu[1] += delta / acu[0]; // Mean
            acu[2] += delta * (d - acu[1]); // M2
        },
        (acuA, acuB) -> { // See chapter about "Parallel algorithm" : only called if stream is parallel ...
            double delta = acuB[1] - acuA[1];
            double count = acuA[0] + acuB[0];
            acuA[2] = acuA[2] + acuB[2] + delta * delta * acuA[0] * acuB[0] / count; // M2
            acuA[1] += delta * acuB[0] / count;  // Mean
            acuA[0] = count; // Count
            return acuA;
        },
        acu -> acu[2] / (acu[0] - 1.0), // Var = M2 / (count - 1)
        UNORDERED);

Then simply call this collector on your stream :

double stdDev = Math.sqrt(outPut.stream().boxed().collect(VARIANCE_COLLECTOR));
sebyku
  • 149
  • 2
  • 3