7

Why does the following code:

StringBuilder sb22 = IntStream
     .range(1, 101)
     .filter(x -> x > 50)
     .boxed()
     .parallel()
     .collect(// object that is used in accumulator to do accumulating on
              StringBuilder::new, 
              // use object from above and call append on it with each stream element as argument
              (sb, a) -> sb.append(":" + a),  
              // (executes only when using parallel!)
              (sb1, sb2) -> { 
                     System.out.println(Thread.currentThread().getId() + "  " + "sb1=" + sb1 + " AND " + "sb2=" + sb2); 
                     sb1.append("-"+sb2);
              });

produces this result:

------------------:51:52:53-:54:55:56-:57:58:59-:60:61:62-:63:64:65-:66:67:68-:69:70:71-:72:73-:74:75-:76:77:78-:79:80:81-:82:83:84-:85:86:87-:88:89:90-:91:92:93-:94:95:96-:97:98-:99:100

shouldn't first part (------------------) be excluded from the output?

Also, I understood that combiner in collect can potentially be called out of order so it is possible to have instead :76:77:78-:79:80:81 e.g. :63:64:65-:79:80:81?

UPDATE (after @Holger response)

This is the tree generated using code he linked for this case:

                                                                                                                                                    [51..100]                                                                                                                                                     
                                                                        _________________________________________________________________________________/\______________________________________________________________________                                                                                 
                                                                       |                                                                                                                                                         |                                                                                
                                                                    (empty)                                                                                                                                                 [51..100]                                                                             
                                    ___________________________________/\__________________________________                                                                              ________________________________________/\______________________________________                                         
                                   |                                                                       |                                                                            |                                                                                |                                        
                                (empty)                                                                 (empty)                                                                     [51..75]                                                                         [76..100]                                    
                ___________________/\______________                                     ___________________/\______________                                       ______________________/\________________                                         ______________________/\________________                       
               |                                   |                                   |                                   |                                     |                                        |                                       |                                        |                      
            (empty)                             (empty)                             (empty)                             (empty)                              [51..62]                                 [63..75]                                [76..87]                                 [88..100]                  
        _______/\______                 ___________/\______                     _______/\______                 ___________/\______                      ________/\_______                   _____________/\_______                       ________/\_______                   _____________/\_______              
       |               |               |                   |                   |               |               |                   |                    |                 |                 |                      |                     |                 |                 |                      |             
    (empty)         (empty)         (empty)             (empty)             (empty)         (empty)         (empty)             (empty)             [51..56]          [57..62]          [63..68]               [69..75]              [76..81]          [82..87]          [88..93]               [94..100]         
    ___/\__         ___/\__         ___/\__         _______/\__             ___/\__         ___/\__         ___/\__         _______/\__              ___/\___          ___/\___          ___/\___          ________/\__               ___/\___          ___/\___          ___/\___          ________/\___         
   |       |       |       |       |       |       |           |           |       |       |       |       |       |       |           |            |        |        |        |        |        |        |            |             |        |        |        |        |        |        |             |        
(empty) (empty) (empty) (empty) (empty) (empty) (empty)     (empty)     (empty) (empty) (empty) (empty) (empty) (empty) (empty)     (empty)     [51..53] [54..56] [57..59] [60..62] [63..65] [66..68] [69..71]     [72..75]      [76..78] [79..81] [82..84] [85..87] [88..90] [91..93] [94..96]     [97..100]     
                                                            ___/\__                                                                 ___/\__                                                                         ___/\___                                                                         ____/\__     
                                                           |       |                                                               |       |                                                                       |        |                                                                       |        |    
                                                        (empty) (empty)                                                         (empty) (empty)                                                                [72..73] [74..75]                                                                [97..98] [99..100]
Bojan Vukasovic
  • 2,054
  • 22
  • 43
  • And what is your expected output? – Bentaye Mar 05 '18 at 13:50
  • 1
    As a side note: You don't need the filter, just do `.range(51, 101)` – Bentaye Mar 05 '18 at 13:53
  • 2
    @Bentaye if not for that filter, the `------------------` part wouldn't have been in the output an that question wouldn't have been asked :) – Eran Mar 05 '18 at 13:58
  • @Eran it is just a comment, not an answer :) – Bentaye Mar 05 '18 at 14:05
  • @Eran that is a very good comment btw, no idea if you realized *how* good it actually is; because without the `filter` there would not be empty `StringBuilders` to merge with other non-empty ones... – Eugene Mar 05 '18 at 19:47
  • @Eugene well, the comment was based on Patrick Parker's answer, and I also tested it and found it to be true. – Eran Mar 05 '18 at 19:51
  • @Eran right, but this will happen regardless of type (`StringBuilder`). For elements that are "filtered", there will always be values from the `Supplier` generated and merged. Do you know if this has been asked before btw? Seems like a very interesting question and why this happens this way – Eugene Mar 05 '18 at 20:28

3 Answers3

4

The workload splitting happens before anything has been processed, hence, the Stream implementation will split the range [1, 101] into subranges to be processed. At this point, it doesn’t know that the filter will remove the first half entirely, it can’t know without evaluating the predicate and that should already happen in parallel, hence, after the workload splitting.

Therefore, each subrange gets processed the same way, including collecting the results into a container and combining those containers afterwards, even if they happen to be empty. The specification doesn’t say that the combining step will be skipped when no elements arrived at the collector, hence, you shouldn’t expect that. While in theory, it would be possible to track whether any elements reached the collector, this tracking would only serve a specific case and it’s not even clear whether combining a container with an empty container (like adding an empty List or appending an empty StringBuilder) is more expensive than this tracking.

Of course, nothing stops you from optimizing your combiner, if it retains the semantics, e.g. instead of (sb1, sb2) -> sb1.append(sb2), you could use (sb1, sb2) -> sb1.length()==0? sb2: sb1.append(sb2)

You can look at this Q&A, “Visualization of Java Stream parallelization” for more details.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • This seems like a logical explanation. Can you please also tell me if I broke associativity rule by using this `sb1.append("-"+sb2)` (as in comments below) ? – Bojan Vukasovic Mar 06 '18 at 10:04
  • 1
    “Associativity” alone isn’t enough to explain it. The [documentation of `Collector`](https://download.java.net/java/jdk10/docs/api/java/util/stream/Collector.html) says “To ensure that sequential and parallel executions produce equivalent results, the collector functions must satisfy an *identity* and an *associativity* constraints. — The identity constraint says that for any partially accumulated result, combining it with an empty result container must produce an equivalent result.” – Holger Mar 06 '18 at 10:21
  • Thanks. I updated question with tree generated with code from your link. – Bojan Vukasovic Mar 06 '18 at 10:29
2

It looks an attempted optimization resulted in some unnecessary StringBuilders being created to process x < 51. These builders never accumulated strings due to the filter, but even though they were empty they were still concatenated to the others. Maybe with smarter optimization some of that work could have been eliminated.

Regarding your second question, if you wanted to swap order only during the concatenation you would write sb2.append(sb1), although that would create unreliable results since you are appending in a different order, and this inconsistent behavior would go against the contract.

Patrick Parker
  • 4,863
  • 4
  • 19
  • 51
  • for the second, you will be violating the contract (by the docs), this is only allowed for something that does not care about order, like a `Set` for example – Eugene Mar 05 '18 at 13:56
  • 2
    @FedericoPeraltaSchaffner oh? try changing the range to `51,101` or try removing the filter. Either change will eliminate the leading hyphens. – Patrick Parker Mar 05 '18 at 14:00
  • @PatrickParker it's like putting elements in parallel into an `ArrayList` and trying to understand what and where exactly fails. – Eugene Mar 05 '18 at 14:07
2

You have broken associativity in sb1.append("-"+sb2), this is stated by the docs. So, in parallel execution what you get is really unknown/unpredictable.

A correct combiner would be for example StringBuilder::append or as lambda :

(left, right) -> left.append(right)

They can't be out of order, they will preserve the order (whatever that order is). For example if you would have streamed from a HashSet (which does not have any order) you would get a different result. Potentially, using java-9 and Set.of this result will differ from run to run.

Eugene
  • 117,005
  • 15
  • 201
  • 306
  • I guess I'm wrong, so can you pls. explain if this logic of associativity is wrong: `A op B op C op D == (A op B) op (C op D)`; First case: `'a-b'`, `'a-b-c'`, `'a-b-c-d'`; Second case: `'a-b'`, `'c-d'`, `'a-b-c-d'`? – Bojan Vukasovic Mar 05 '18 at 14:50
  • this answer entirely misses the point. of course he broke associativity... the whole purpose of his code was to expose the inner workings, which one shouldn't normally do. – Patrick Parker Mar 06 '18 at 06:09
  • @PatrickParker :) of course he did? *pls. explain if this logic of associativity is wrong* rings a bell may be? Also, if your answer focuses on the "inner workings", then I sort of missed them. You might want to expand it explaining why the empty StringBuilders are merge in the first place – Eugene Mar 06 '18 at 07:42
  • @Eugene Sorry you didn't find my comment helpful, but please follow appropriate etiquette. If there is a problem with another answer, you should post the comment there. I stand by what I wrote here; it is not being petty, it is a valid critique that this answer entirely misses the point. See the accepted answer for details. – Patrick Parker Mar 06 '18 at 13:21
  • @PatrickParker I wasn't ranting, sorry if I made it sound as such, that was not my point. my point was that this is not associative and notice that `identity` is enforced via `StringBuilder::new`. The answer's point was that both associativity and identity are needed to support a correct sequential/parallel result. – Eugene Mar 06 '18 at 13:26