0

If we have a list of doubles as List<Double> weights we want to select an index of it where the cumulative sum till that index is greater than a random number. If the loop reaches to the end of the list and still the condition is not met we return the last index inevitably. We can use a simple loop and do the check for that but I want to see if there are better, more Java, ways which use stream, reduce, parallel stream, or etc. that comes to your mind.

final double[] cSum = { 0 };
double rnd = random.nextDouble();
int ind = weights.stream().reduce(weight -> {
    cSum[0] += weight;

    if (rnd < cSum[0]) {
        return weights.indexOf(weight);
    }
}).get();
Harshal Parekh
  • 5,918
  • 4
  • 21
  • 43
Alan
  • 417
  • 1
  • 7
  • 22
  • have u tried anything ? or POC code ? – Ryuzaki L Aug 17 '23 at 21:07
  • I tried reduce but not sure how to use the if condition to check the the condition i mentioned inside it. this is what i have which is wrong and not complete. ```final double[] cSum = {0}; double rnd = random.nextDouble(); int ind = weights.stream().reduce(weight -> { cSum[0] += weight; if (rnd < cSum[0]) return weights.indexOf(weight); }).get();``` – Alan Aug 17 '23 at 21:12
  • i feel using classic for each loop with incremented value is good choice for performance also – Ryuzaki L Aug 17 '23 at 21:24
  • Streams aren't good for problems that process only part of a list. – Gene Aug 18 '23 at 03:01

1 Answers1

1

This is one of the problems where a for loop would be more readable over streams.

int sum = 0;
int random = 4;
int[] arr = {3, 5, 6, 1};
int index = arr.length - 1;
for (int i = 0; i < arr.length; i++) {
    sum += arr[i];
    if (sum > random) {
        index = i;
        break;
    }
} 

System.out.println(index);

I want to see if there are better, more Java, ways which use stream, reduce, parallel stream, or etc. that comes to your mind.

A for loop through an array is extremely lightweight both in terms of heap and CPU usage. If raw speed and memory thriftiness is a priority, using a stream is worse. The decision whether to use Streams or not should not be driven by performance consideration, but rather by readability. When it really comes to performance, there are other considerations.

Streams also follow lazy evaluation: Computation on the source data is only performed when the terminal operation is initiated, and source elements are consumed only as needed. All intermediate operations are lazy, so they’re not executed until a result of a processing is actually needed.

What you want is intermediate results at every index to compare it with the random number, personally, I'd prefer writing a normal for loop in this case.


Parallel Streams do not guarantee the order of traversal, so it'd be the wrong implementation choice here.


You can read more about for loops vs streams here

Harshal Parekh
  • 5,918
  • 4
  • 21
  • 43