1
int[] arr = new int[]{0};
l.stream().forEach(x -> {if (x > 10 && x < 15) { arr[0] += 1;}});

l is List<Integer>. Here I use one element arr array to store value that is changed inside the stream. An alternative solution is to use an instance of AtomicInteger class. But I don't understand what is the difference between these two approaches in terms of memory usage, time complexity, safety...

Please note: I am not trying to use AtomicInteger (or array) in this particular piece of code. This code is used only as an example. Thanks!

masha
  • 33
  • 3
  • Hello, you should start investigating each approach separately at first. Learn for what are they used and in each case. There are several articles on the internet to start with, also a couple here on StackOverflow. Maybe this is a starting [point](https://stackoverflow.com/questions/13598679/java-using-atomicinteger-vs-static-int). – alexandrum Dec 16 '21 at 15:02
  • 2
    What are you actually trying to do? – Bohemian Dec 16 '21 at 16:00
  • @Bohemian The real piece of code I am trying to implement using this technology tabulates the text with respect to the number of opening or closing brackets. But that piece of code is too big to demonstrate the main idea: I need to count something inside the stream and use an outer variable to store the result. – masha Dec 17 '21 at 06:29
  • I would say, don't use a stream; just use a loop. Nonparallel streams, which you must use, are slower than loops anyway. Go for readability and simplicity first. – Bohemian Dec 17 '21 at 06:59

2 Answers2

3

You should always use AtomicInteger:

  • The performance impact is negligible. Technically, new int[1] is 'faster', but they are the same size, or, the array is actually larger in heap (but unlikely; depends on your OS architecture, usually they'd end up being the same size), and the array does not spend any cycles on guaranteeing proper concurrency protections, but there are really only two options: [A] the concurrency protections are required (because it's a lambda that runs in another thread), and thus the int array is a non-starter; it would result in hard to find bugs, quite horrible, or [B] they aren't required, and the hotspot engine is likely going to figure that out and eliminate this cost entirely. Even if it doesn't, the overhead of concurrency protection when there is no contention is low in any case.

  • It is more readable. Only slightly so, but new int[1] is weirder than new AtomicInteger(), I'd say. AtomicInteger at least suggests: I want a mutable int that I'm going to mess with from other contexts.

  • It is more convenient. System.out.println-ing an atomicinteger prints the value. sysouting an array prints garbage.

  • The convenience methods in AtomicInteger might be relevant. Maybe compareAndSet is useful.

But why?

Lambdas are not transparent in the following 3 things:

  • Checked exceptions (you cannot throw a checked exception inside a lambda even if the context around your lambda catches it).
  • Mutable local vars (you cannot touch, let alone change, any variable declared outside of the lambda, unless it is (effectively) final).
  • Control flow. You can't use break, continue, or return from inside a lambda and have it act like it wasn't: You can't break or continue a loop located outside of your lambda and you can't return form the method outside of your lambda (you can only return from the lambda itself).

These are all very bad things when the lambda runs 'in context', but they are all very good things when the lambda doesn't run in context.

Here is an example:

new TreeSet<String>((a, b) -> a - b);

Here I have created a TreeSet (which is a set that keeps its elements sorted automatically). To make one, you pass in code that determines for any 2 elements which one is 'the higher one', and TreeSet takes care of everything else. That TreeSet can survive your method (just store it in a field or pass it to a method that ends up storing it in a field) and could even escape your thread (have another thread read that field). That means when that code (a - b in this code) is invoked, we could be 5 days from the creation of that TreeSet, in another thread, with the code that 'surrounds' your new TreeSet statement having loooong gone.

In this scenario, all those transparencies make no sense at all:

What does it mean to break back to a loop that has long since completed and the system doesn't even know what it is about anymore?

That catch block uses context that is long gone, such as local vars or the parameters. It can't survive, so if your a - b were to throw something that is checked, the fact that you've wrapped your new TreeSet<> in a try/catch block is meaningless.

What does it mean to access a variable that no longer exists? For that matter, if it still does exist but the lambda runs in a separate thread, do we now start making local vars volatile and declare them on heap instead of stack just in case?

Of course, if your lambda runs within context, as in, you pass the lambda to some method and that method 'uses it or loses it': Runs your lambda a certain amount of times and then forgets all about it, then those lacking transparencies are really annoying.

It's annoying that you can't do this:

public List<String> toLines(List<Path> files) throws IOException {
  var allLines = files.stream()
    .filter(x -> x.toString().endsWith(".txt"))
    .flatMap(x -> Files.readAllLines().stream())
    .toList();
}  

The only reason the above code fails is that Files.readAllLines() throws IOException. We declared that we throws this onwards but that won't work. You have to kludge up this code, make it bad, by trying to somehow transit that exception out of the lambda or otherwise work around it (the right answer is NOT the use the stream API at all here, write it with a normal for loop!).

Whilst trying to dance around checked exceptions in lambdas is generally just not worth it, you CAN work around the problem of wanting to share a variable with outer context:

int sum = 0;
listOfInts.forEach(x -> sum += x);

The above doesn't work - sum is from the outer scope and thus must be effectively final, and it isn't. There's no particular reason it can't work, but java won't let you. The right answer here is to use int sum = listOfInts.mapToInt(Integer::intValue).sum(); instead, but you can't always find a terminal op that just does what you want. Sometimes you need to kludge around it.

That's where new int[1] and AtomicInteger comes in. These are references - and the reference is final, so you CAN use them in the lambda. But the reference points at an object and you can change it at will, hence, you can use this 'trick' to 'share' a variable:

AtomicInteger sum = new AtomicInteger();
listOfInts.forEach(x -> sum.add(x));

That DOES work.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
3

Knowing which is the best way is important and @rzwitserloot's explanation covers that in great detail. In your specific example, you could avoid the issue by doing it like this.

List<Integer> list = List.of(1,2,11,12,15,11,11,9,10,2,3);

int count = list.stream().filter(x->x > 10 && x < 15).reduce(0, (a,b)->a+1);
// or
int count = list.stream().filter(x->x > 10 && x < 15).mapToInt(x->1).sum();

Both return the value 4

In the first example, reduce sets an initial value of 0 and then adds 1 to it (b is syntactically required but not used). To sum the actual elements rather than 1, replace 1 with b in the reduce method.

In the second example, the values are replace with 1 in the stream and then summed. Since the method sum() doesn't exist for streams of objects, the 1 needs to be mapped to an int to create an IntStream. To sum the actual elements here, use mapToInt(x->x)

As suggested in the comments, you can also do it like this.

long count = list.stream().filter(x->x > 10 && x < 15).count();

count() returns a long so it would have to be down cast to an int if that is what you want.

WJS
  • 36,363
  • 4
  • 24
  • 39
  • Any particular reason why the second example is summing 1s, instead of doing a simple count, as the original code effectively does? – Boris B. Dec 16 '21 at 15:53
  • I thought the op wanted an integer and I didn't want to cast. Count returns a long. But your suggestion is a good one. – WJS Dec 16 '21 at 15:54