5

I was wondering if there is any way to achieve the following within a single iteration over the array. Simply to have two different results out of stream.

double sum = Arrays.stream(doubles).sum();
double sumOfSquares = Arrays.stream(doubles).map(d -> d * d).sum();
Brian Goetz
  • 90,105
  • 23
  • 150
  • 161
HectorBarbossa
  • 185
  • 1
  • 3
  • 9
  • 3
    If you find yourself need lots of these kinds of statistics then subclassing [DoubleSummaryStatistics](https://docs.oracle.com/javase/8/docs/api/java/util/DoubleSummaryStatistics.html) might make sense. – the8472 May 20 '16 at 16:45
  • 2
    Related question (using the idea from @the8472) http://stackoverflow.com/questions/36263352/java-streams-standard-deviation – Tunaki May 20 '16 at 17:48
  • 6
    The recommended approach here is indeed to subclass `DoubleSummaryStatistics`. That said, be careful. We considered including sum-of-squares in DSS, but chose not to because (a) its more computation that many users want, but more importantly (b) it is very easy to get in numerical trouble with floating point calculating variance by sum-of-squares. (Squaring makes big numbers bigger and small numbers smaller, leading to risk of losing data when you add them.) See Knuth AOCP, vol 2, sec 4.2.2 for more details. – Brian Goetz May 20 '16 at 19:06

1 Answers1

14

Well, you could with a custom collector, for instance:

double[] res =
    Arrays.stream(doubles)
          .collect(() -> new double[2],
                   (arr, e) -> {arr[0]+=e; arr[1]+=e*e;},
                   (arr1, arr2) -> {arr1[0]+=arr2[0]; arr1[1]+=arr2[1];});

double sum = res[0];
double sumOfSquares = res[1];

but you don't gain much readability in my opinion, so I would stick with the multiple passes solution (or maybe just use a for-loop in this case).

Alexis C.
  • 91,686
  • 21
  • 171
  • 177
  • Thanks a lot! I'm still learning streams, so it's really helpful. – HectorBarbossa May 20 '16 at 18:41
  • Well, as far as I'm concerned, streams don't generally result in increased readability, especially where some more than basic operation is needed (as is the case here). I can't help but smile when stream proponents claim that for loops are less readable, they are by far the most intuitive, breaking down the logic in the most obvious way. That said, this solution really helps when one wants to do a parallel Monte Carlo simulation and wants the standard error too. The way I understand it the alternative would be to use thread pools, futures, etc and that's definitely less readable and concise? – Yian Pap Dec 09 '20 at 17:16