2

The question is straight forward: Why can't we use StringBuilder(...) as identity function in the reduce(...) operations in the java8 streams, but string1.concat(string2) can be used as the identity function?

string1.concat(string2) can be seen as similar to builder.append(string) (though it is understood that there are few differences in these opeations), but I am not able to understand the difference in the reduce operation. Consider the following example:

  List<String> list = Arrays.asList("1", "2", "3"); 
  
  // Example using the string concatenation operation
  System.out.println(list.stream().parallel()
            .reduce("", (s1, s2) -> s1 + s2, (s1, s2)->s1 + s2));

  // The same example, using the StringBuilder
  System.out.println(list.stream() .parallel()
            .reduce(new StringBuilder(""), (builder, s) -> builder
                    .append(s),(builder1, builder2) -> builder1
                    .append(builder2)));
 
 // using the actual concat(...) method
 System.out.println(list.stream().parallel()
            .reduce("", (s1, s2) -> s1.concat(s2), (s1, s2)->s1.concat(s2)));

Here is the output after executing above lines:

 123
 321321321321   // output when StringBuilder() is used as Identity
 123

builder.append(string) is an associative operation as str1.concat(str2) is. Then why does concat work and append doesn't?

Federico klez Culloca
  • 26,308
  • 17
  • 56
  • 95
theimpatientcoder
  • 1,184
  • 3
  • 19
  • 32
  • 3
    Just to clarify: "Why can't we use" indicates your code doesn't compile while the output you've shown seems to indicate it did but the output is different (and if so I'd say the StringBuilder version's output of `123` would be along the lines of what I'd expect). – Thomas Aug 24 '20 at 07:48
  • Also, please use clear and descriptive language. o/p doesn't really mean anything, specifically, in particular, and it can be interpreted in many ways. Am I right that you imply *Output*? – Giorgi Tsiklauri Aug 24 '20 at 07:50
  • 4
    `new StringBuilder("")` is the identity value used in all threads, that is, you're appending to the same instance. – Andy Turner Aug 24 '20 at 07:52
  • @GiorgiTsiklauri given the context it's pretty clear they mean "output". I agree it's ugly, but it's clear. – Federico klez Culloca Aug 24 '20 at 07:53
  • @FedericoklezCulloca it might be identifiable, but no, it is not *pretty clear*. I took the guess, but some other person can take another guess.. so, it very much depends. – Giorgi Tsiklauri Aug 24 '20 at 07:55
  • 2
    Also: you're not using `str1.concat(str2)`. The actual `concat` method is rarely used: the compiler will typically rewrite `str1 + str2` as `new StringBuilder(str1).append(str2).toString()` (or similar) - which is indeed also associative, so it doesn't really alter the point. – Andy Turner Aug 24 '20 at 07:55
  • I hope my edits will clarify few of the above things. – theimpatientcoder Aug 24 '20 at 08:01
  • 1
    @AndyTurner starting with Java 9, `str1 + str2` will get compiled using a single `invokedynamic` instruction, to let the runtime decide what to do. Since the runtime is free to select a different strategy for every particular case, it could even redirect to `String.concat` when the term is concatenating exactly two strings. – Holger Aug 24 '20 at 08:29

3 Answers3

9

Yes, append is indeed associative, but that is not the only requirement for the function passed as the accumulator and combiner. According to the docs, they have to be:

  • Associative
  • Non-interfering
  • Stateless

append is not stateless. It is stateful. When you do sb.append("Hello"), not only does it return a StringBuilder with Hello appended to the end, it also changes the contents (i.e. the state) of sb.

Also from the docs:

Stream pipeline results may be nondeterministic or incorrect if the behavioral parameters to the stream operations are stateful. A stateful lambda (or other object implementing the appropriate functional interface) is one whose result depends on any state which might change during the execution of the stream pipeline.

Also because of this, new StringBuilder() is not a valid identity, once the accumulator or the combiner has been applied. Something would have been added to the empty string builder, and the following equation, which all identities must satisfy, is no longer satisfied:

combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t)

It is possible that the parallel stream makes use of the old string builders after calling the accumulators and/or combiners, and expects their contents to not be changed. However, the accumulators and combiners mutate the string builders, causing the stream to produce incorrect results.

On the other hand, concat satisfies all three of the above. It is stateless because it does not change the string on which it is called on. It just retunes a new, concatenated string. (String is immutable anyway and can't be changed :D)

Anyway, this is a use case of mutable reduction with collect:

System.out.println((StringBuilder)list.stream().parallel()
    .collect(
        StringBuilder::new, 
        StringBuilder::append, 
        StringBuilder::append
    )
);
Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • 8
    As an addendum: once the first `append` with a non-empty argument has been performed, the `StringBuilder` is not a valid *identity value* anymore, so it also violates the requirement that the first argument to the three-arg `reduce` must be an identity value. – Holger Aug 24 '20 at 08:24
  • Just being verbose here, from this comment I understand this, using StringBuilder() will violate this property - combiner.apply(u, accumulator.apply(identity, t)) == accumulator.apply(u, t) as the first append has already modified the stateful nature of the builder? – theimpatientcoder Aug 24 '20 at 08:37
  • 2
    @neerajdorle yes, that is what Holger said. Now that I think about it, I should have used this argument first instead. This is much simpler to articulate :D – Sweeper Aug 24 '20 at 08:44
1

After read the doc and do many tests, I think reduce is something like following steps:

  1. there will be multi threads to do the reduce, every thread do a partial reduce;
  2. for identity, there will be only one instance. Every accumulator will use this identity instance;
  3. first do accumulate with identity instance and a string element to get a StringBuilder;
  4. combine all these StringBuilders;

so the problem is every accumulate with identity instance and a string element will cause identity changed. the identity in the accumulates after first time is not identity anymore.

for example, we consider an list with 2 element {"1","2"}. there will be 2 threads and every thread do 1 accumulate and one of them do last combine. thread A do accumulate identity with element "1", then result is a StringBuilder which content is "1"(still be the identity, becuase return object of StringBuilder.append is itself), but identity also changed to content "1". then thread B do accumulate identity with element "2", then result is "12", not "2" any more. then do combine is the result of these two accumulate result, they are all the identity instance itself, so the result will be "1212". It like following code snippet:

StringBuilder identity = new StringBuilder();
StringBuilder accumulate1 = identity.append("1");
StringBuilder accumulate2 = identity.append("2");
StringBuilder combine = accumulate1.append(accumulate2);
// combine and accumulate1 and accumulate2 are all identity instance and result is "1212"
return combine; 

for more elements, because of threads running randomly, the result will different every time.

after we know the reason, if we fix the accumulator as following

new StringBuilder(builder).append(s)

and full line code will like:

System.out.println(list.stream().parallel().reduce(new StringBuilder(), (builder, s) -> new StringBuilder(builder).append(s),
        (builder1, builder2) -> new StringBuilder(builder1).append(builder2)));

then there will be no issue any more because accumulator will not change identity instance and return new StringBuilder every time. But it is not worth to do this as no benefit comparing with String concat method.

Edit: Thanks @Holger's example, seems if there is filter function, then some accumulators may be skipped. so the combiner function also need be changed to

new StringBuilder(builder1).append(builder2)
andy
  • 1,336
  • 9
  • 23
  • 2
    Indeed, constructing a new `StringBuilder` will fix the issue, but you have to do it in both functions. Further, it will destroy the benefit of using a `StringBuilder` instead of `String.concat`. – Holger Aug 24 '20 at 11:02
  • I think combiner function does not need to new StringBuilder because combiner will only use the object returned from accumulator. and I have test for a list with about 50 elements, the result is correct. – andy Aug 25 '20 at 07:20
  • 1
    It is not a prove of correctness if a code happens to do the desired thing in a test. You don’t know, how many elements the accumulator function will process before the result gets passed to the combiner function. It may be zero times. This might not happen when the source is a List and no filter nor flatMap is involved, but it is possible. – Holger Aug 25 '20 at 07:38
  • Thanks for your comment. @Holger. I try to say why it correct is combiner will get parameters from a combiner or accumulate then return an instance to other combiner and if accumulate return new instance every time, the result should be correct. I take test for example just to try to provide a example, not to prove. – andy Aug 25 '20 at 07:55
  • 2
    Try something like `IntStream.range(0, 50) .parallel() .filter(i -> ((i>>2)&1)==0) .mapToObj(i -> ""+(char)('A'+i)) .reduce( … )` – Holger Aug 25 '20 at 08:15
  • Cool. Thanks @Holger. so the combiner also need using new instance. seem if there has filter and the accumulator method will skipped. Thanks. Learn something every day. – andy Aug 25 '20 at 08:39
  • 1
    If you want to know more about this parallel processing [Visualization of Java Stream parallelization](https://stackoverflow.com/q/34381805/2711488) may be interesting for you. – Holger Aug 25 '20 at 08:50
0

Don't use the .reduce() when there is already an implemantion (or own .collect() like Sweeper's answer).

List<String> list = Arrays.asList("1", "2", "3"); 
  
// Example using the string concatenation operation
System.out.println(list.stream()
   .parallel()
   .collect(Collectors.joining())
);
// prints "123"

Edit (this will not work for parallel streams)

Depends on of the implementation of .joining():

final List<String> list = Arrays.asList("1", "2", "3");
System.out.println(list.stream().reduce(new StringBuilder(), 
    StringBuilder::append, 
    StringBuilder::append)
    .toString()
);
// prints "123"
akop
  • 5,981
  • 6
  • 24
  • 51
  • I think that OP is asking about the different outputs obtained from the two reduce operations executed using parallel computation. The question is not about alternative methods of concatenating a stream of Strings. – RubioRic Aug 24 '20 at 08:01
  • 1
    “String is not StringBuilder” is not a useful explanation. The key point is to understand that String is an immutable type which can be used as value, e.g. in a Reduction, whereas using a mutable type requires [Mutable Reduction](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#MutableReduction). Or, as you would say it, “`reduce` is not `collect`”. – Holger Aug 24 '20 at 08:02
  • Maybe my edit explains the “String is not StringBuilder”-sentence and why I posted the code `.joining()`. – akop Aug 24 '20 at 08:12
  • 2
    No, your edit made it worse. Using `StringBuilder` with `reduce` in this way is not correct, even when it happens to produce the intended result when you remove the `.parallel()`. Your edit shows that you didn’t understand the difference at all. – Holger Aug 24 '20 at 08:21
  • My question is more concerned about the parallel stream, in your edit, it prints 123 as the stream is not parallel. – theimpatientcoder Aug 24 '20 at 08:21
  • It's true, a parallel-stream has no guarantees of a correct order. – akop Aug 24 '20 at 08:27
  • And I took only the lambda-definitions of `Collectors` to the use-case for the op. – akop Aug 24 '20 at 08:29
  • So, I hope it is better now. ^^ – akop Aug 24 '20 at 08:32