6

I have a discussion with colleague that we should not be using setters inside stream.map() like the solution suggested here - https://stackoverflow.com/a/35377863/1552771

There is a comment to this answer that discourages using map this way, but there hasn’t been a reason given as to why this is a bad idea. Can someone provide a possible scenario why this can break?

I have seen some discussions where people talk about concurrent modification of the collection itself, by adding or removing items from it, but are there any negatives to using map to just set some values to a data object?

ETO
  • 6,970
  • 1
  • 20
  • 37
raksugi
  • 95
  • 1
  • 4
  • 1
    I guess it depends on *what* you are setting. `map` is supposed to *transform* the elements of a stream into different elements. So if you are simply "misusing" it to do a `forEach` and continue with the stream (too bad forEach doesn't return the stream) then I think it is just "bad style" but not "dangerous". It's like setting `question.setAnswered(true)` when the setter does something else like `setAnswer(boolean b){ deppThought.meditateOverMeaningOfLive();}` .. a `setter` is for setting, a `map()` is for mapping.. – GameDroids Oct 22 '19 at 13:29
  • I was not able to add a comment to the original post as I do not have enough reputation. That would have been easier. – raksugi Oct 22 '19 at 13:29
  • 7
    One of the problems is that `map` operation might not even be performed if the whole Stream pipeline decides that it does not have to be performed to compute the final result. I have come across it using Java 11 - I am not sure if it applies to earlier versions - for Java 8 though there is no such optimization. – Michał Krzywański Oct 22 '19 at 13:29
  • 3
    I forget myself, but I have a hunch that it's due to [this](https://docs.oracle.com/en/java/javase/13/docs/api/java.base/java/util/stream/package-summary.html#NonInterference), in addition to the problem that @michalk specified. – Avi Oct 22 '19 at 13:30
  • 2
    @Avi the discussions on the original question mention that modifying an object is not interference, as they do not modify the data *source* (which contains only references anyway). – RealSkeptic Oct 22 '19 at 13:34
  • @Avi Thanks for the link, but there is a comment from Brian Goetz to the previous answer in that thread, that this does not constitute interference (he is talking about peek but I assume that applies to map as well), but is an abuse of `peek`'s contract. – raksugi Oct 22 '19 at 13:38
  • @raksugi Abuse of `peek`'s contract is different from `map` - `map` does not have the same intention (`peek` is supposed to be used for debugging only, `map` is not). – Avi Oct 22 '19 at 13:57
  • 2
    @michalk in case of `count()`, it’s since Java 9. In case of short-circuiting methods, like `…Match` or `find…`, it’s always the case that the `map` is not performed on all elements. But you can not assume that it is only applied to the necessary minimum number of elements either, so it’s just unpredictable. Most of [this answer](https://stackoverflow.com/a/33636377/2711488) for `peek` applies to side effects in `map` as well, when just adjusting the purpose of the operation (to provide a value when needed). – Holger Oct 23 '19 at 08:23

4 Answers4

5

Using side effects in map like invoking a setter, has a lot of similarities to using peek for non-debugging purposes, which have been discussed in In Java streams is peek really only for debugging?

This answer has a very good general advice:

Don't use the API in an unintended way, even if it accomplishes your immediate goal. That approach may break in the future, and it is also unclear to future maintainers.

Whereas the other answer names associated practical problems; I have to cite myself:

The important thing you have to understand, is that streams are driven by the terminal operation. The terminal operation determines whether all elements have to be processed or any at all.

When you place an operation with a side effect into a map function, you have a specific expectation about on which elements it will be performed and perhaps even how it will be performed, e.g. in which order. Whether the expectation will be fulfilled, depends on other subsequent Stream operations and perhaps even on subtle implementation details.

To show some examples:

IntStream.range(0, 10) // outcome changes with Java 9
    .mapToObj(i -> System.out.append("side effect on "+i+"\n"))
    .count();
IntStream.range(0, 2) // outcome changes with Java 10 (or 8u222)
    .flatMap(i -> IntStream.range(i * 5, (i+1) * 5 ))
    .map(i -> { System.out.println("side effect on "+i); return i; })
    .anyMatch(i -> i > 3);
IntStream.range(0, 10) // outcome may change with every run
    .parallel()
    .map(i -> { System.out.println("side effect on "+i); return i; })
    .anyMatch(i -> i > 6);

Further, as already mentioned in the linked answer, even if you have a terminal operation that processes all elements and is ordered, there is no guaranty about the processing order (or concurrency for parallel streams) of intermediate operations.

The code may happen to do the desired thing when you have a stream with no duplicates and a terminal operation processing all elements and a map function which is calling only a trivial setter, but the code has so many dependencies on subtle surrounding conditions that it will become a maintenance nightmare. Which brings us back to the first quote about using an API in an unintended way.

Holger
  • 285,553
  • 42
  • 434
  • 765
2

I think the real issue here is that it is just bad practice and violates the intended use of the capability. For example, one can also accomplish the same thing with filter. This perverts its use and also makes the code confusing or at best, unnecessarily verbose.

   public static void main(String[] args) {
      List<MyNumb> foo =
            IntStream.range(1, 11).mapToObj(MyNumb::new).collect(
                  Collectors.toList());
      System.out.println(foo);
      foo = foo.stream().filter(i ->
      {
         i.value *= 10;
         return true;
      }).collect(Collectors.toList());
      System.out.println(foo);

   }


    class MyNumb {
       int value;

       public MyNumb(int v) {
          value = v;
       }
       public String toString() {
          return Integer.toString(value);
       }
    }

So going back to the original example. One does not need to use map at all, resulting in the following rather ugly mess.

foos = foos.stream()
            .filter(foo -> { boolean b = foo.isBlue();
                             if (b) {
                                foo.setTitle("Some value");
                             }
                             return b;})

            .collect(Collectors.toList()); 

WJS
  • 36,363
  • 4
  • 24
  • 39
2

Streams are not just some new set of APIs which makes things easier for you. It also brings functional programming paradigm with it.

And, functional programming paradigm's most important aspect is to use pure functions for computations. A pure function is one where the output depends only and only on its input. So, basically Streams API should use stateless, side-effect-free and pure functions.

Quoting things from Joshua Bloch's Effective Java (3rd Edition)

If you’re new to streams, it can be difficult to get the hang of them. Merely expressing your computation as a stream pipeline can be hard. When you succeed, your program will run, but you may realize little if any benefit. Streams isn’t just an API, it’s a paradigm based on functional programming. In order to obtain the expressiveness, speed, and in some cases parallelizability that streams have to offer, you have to adopt the paradigm as well as the API. The most important part of the streams paradigm is to structure your compu- tation as a sequence of transformations where the result of each stage is as close as possible to a pure function of the result of the previous stage. A pure function is one whose result depends only on its input: it does not depend on any mutable state, nor does it update any state. In order to achieve this, any function objects that you pass into stream operations, both intermediate and terminal, should be free of side-effects. Occasionally, you may see streams code that looks like this snippet, which builds a frequency table of the words in a text file:

// Uses the streams API but not the paradigm--Don't do this!

Map<String, Long> freq = new HashMap<>();

try (Stream<String> words = new Scanner(file).tokens()) {
        words.forEach(word -> { freq.merge(word.toLowerCase(), 1L, Long::sum);
    });
}

What’s wrong with this code? After all, it uses streams, lambdas, and method references, and gets the right answer. Simply put, it’s not streams code at all; it’s iterative code masquerading as streams code. It derives no benefits from the streams API, and it’s (a bit) longer, harder to read, and less maintainable than the corresponding iterative code. The problem stems from the fact that this code is doing all its work in a terminal forEach operation, using a lambda that mutates external state (the frequency table). A forEach operation that does anything more than present the result of the computation performed by a stream is a “bad smell in code,” as is a lambda that mutates state. So how should this code look?

// Proper use of streams to initialize a frequency table

Map<String, Long> freq;

try (Stream<String> words = new Scanner(file).tokens()) {
    freq = words
    .collect(groupingBy(String::toLowerCase, counting()));
}
isank-a
  • 1,545
  • 2
  • 17
  • 22
1

Just to name a few:

  • map() with setter is interfering (it modifies the initial data), while specs require a non-interfering function. For more details read this post.
  • map() with setter is stateful (your logic may depend on initial value of field you're updating), while specs require a stateless function
  • even if you're not interfering the collection that you're iterating over, the side effect of the setter is unnecessary
  • Setters in map may mislead the future code maintainers
  • etc...
Community
  • 1
  • 1
ETO
  • 6,970
  • 1
  • 20
  • 37