0

Issue faced on the way to merge 2 Lists using Streams. At present when I use the Stream.concat it is adding to the bottom of the list, which is not what I want.

List1:

  • [ABC, 123, 456]
  • [DEF, 234, 567]
  • [GHI, 345, 678]

List 2:

  • [ABC, 789, 012]
  • [DEF, 890, 123]
  • [GHI, 901, 234]

Ideal Merged Output:

  • [ABC, 123, 456, 789, 012]
  • [DEF, 234, 567, 890, 123]
  • [GHI, 345, 678, 901, 234]

If the above is not possible, then below also acceptable

  • [ABC, 123, 456, ABC, 789, 012]
  • [DEF, 234, 567, DEF, 890, 123]
  • [GHI, 345, 678, GHI, 901, 234]

Output I am getting at present, which is not what i want

  • [ABC, 123, 456]
  • [DEF, 234, 567]
  • [GHI, 345, 678]
  • [ABC, 789, 012]
  • [DEF, 890, 123]
  • [GHI, 901, 234]

Code:

List<List<XSSFCell>> listData1 = list1.stream().skip(1).collect(Collectors.toList());
    List<List<XSSFCell>> listData2 = list2.stream().skip(1).collect(Collectors.toList());
Stream stream = Stream.concat(listData1.stream(),listData2.stream());

Hope the issue I face is clear, await guidance.

Added This question has been marked as duplicate of this question But am unable to find StreamUtils.zip method which is mentioned in the solution. The first answer in that question, states take it at your own risk. Can this be done with the standard available libraries or can I know how I can get zip method in StreamUtils

Community
  • 1
  • 1
iCoder
  • 1,406
  • 6
  • 16
  • 35
  • So you have `Stream stream = Stream.concat(listData1.stream(),listData2.stream());`. What did you do with that to have the current output you have? (**Note, raw-types warning**) – Tunaki Apr 19 '16 at 16:34
  • Look into the `groupingBy` collector – Alexis C. Apr 19 '16 at 16:37
  • @Tunaki Sorry dint get you. The output am getting at present is because of `Stream stream = Stream.concat(listData1.stream(),listData2.stream());` I have not done anything else. I checked what is stored in the Stream by printing it onto the console using `stream.forEach(e -> System.out.println(e));` Hope I was able to explain to you clearly what I have done at present – iCoder Apr 19 '16 at 16:41
  • Are you wanting index 0 to be a key where you look through and match up items in each stream based on the "key"? Or do you always merge row 1 of each stream, then row 2, then 3, etc? – DanO Apr 19 '16 at 16:47
  • @DanO In this case, the data in the 2 lists are sorted & the first column is always identical. Even if it is not I can sort them & then merge. – iCoder Apr 19 '16 at 16:51
  • This looks like a job for databases - joining, in particular. – rgettman Apr 19 '16 at 17:08
  • Are we allowed to modify listData1's sublists? – DanO Apr 19 '16 at 17:24
  • @DanO why is there a need for sublist in this case? Sorry am a little new to this, but from my limited understanding I dont think it would help. I do not want to modify the contents, but just merge the data into one. – iCoder Apr 19 '16 at 17:47
  • 1
    Basically, you want `Collectors.toMap`. And the key is `list.get(0)` and the value is `list.subList(1, list.size())` and the merge function adds the list together. Then you map each entry of that map into a list where the first element is the key and the rest is the value. – Tunaki Apr 19 '16 at 17:48
  • @Tunaki thank you. I am new to Map & Streams & hence unable to translate what you have informed into code. Is it possible for you to let me know how I should write the code or point me to a link where something like this done (I searched for Collectors.toMap but the links I found do not have the 2 lists merging case). Apologies for the trouble. – iCoder Apr 19 '16 at 18:08
  • @iCoder, the first two examples have a list that we build into so that we don't modify the contents of any of the lists. Realistically, we can do a bunch of other examples that use streams, or don't, in many different ways, but the ones that you now have should give you a good flavour as to how we can do all of this stuff. Happy coding. – DanO Apr 19 '16 at 20:39
  • Why is this marked as a duplicate of some question regarding streams.zip? No clue how this has anything to do with zipping a stream.... – DanO Apr 19 '16 at 20:55
  • 1
    @DanO That's because [zipping Streams is the true functional answer to this question](https://ideone.com/eqLKd6) (no mutation or anything). Take `zip` from the linked question, you have `List> result = zip(list1.stream(), list2.stream(), (l1, l2) -> { List l = new ArrayList<>(l1); l.addAll(l2.subList(1, l2.size())); return l; }).collect(toList());` And that's it. It's only complicated because there's no simple functional way to `addAll`... I really wish `zip` existed in the Stream API. – Tunaki Apr 19 '16 at 21:23
  • Yes, but *it's not part of the API* and the answer is a "use at your own risk" bit of code making use of `Spliterators`. It's akin to saying, "just use xxx third-party library", which isn't what the OP wanted. – DanO Apr 20 '16 at 00:34
  • @Tunaki thank you very much. I am using the approach you've suggested (I have voted for it, but could not mark it as answered, no option for it, sorry). My question has been marked as duplicate, but I strongly believe that it would be hard for anyone to know that my query is similar to that. Also the solution suggested in that has a 'Use at your own risk' which does not give confidence – iCoder Apr 20 '16 at 03:29
  • @iCoder is your both list have same size ..??? – Moinkhan Apr 22 '16 at 05:06
  • @Moinkhan yes they are of same size – iCoder Apr 22 '16 at 05:10
  • ok and you want another 3rd list or new Stream, by merging two of them .?? – Moinkhan Apr 22 '16 at 05:13
  • I want a 3rd list containing the data merged from list 1 & 2. – iCoder Apr 22 '16 at 05:27

1 Answers1

0

This all depends on how complex you want to get, but the long story short is that you have to make a new stream for your second list each time, so you cannot be given two streams -- you can get one stream (which you act upon) and a Collection/List, but not two streams. This is because you cannot stream1.forEach(stream2) without throwing an exception that the second stream was closed or run through.

Here's my setup:

public static void main(String[] args) {
    Collection<List<String>> list1 = new ArrayList<>(5);
    Collection<List<String>> list2 = new ArrayList<>(5);

    list1.add(Arrays.stream(new String[] {"ABC", "123", "456"}).collect(Collectors.toList()));
    list1.add(Arrays.stream(new String[] {"DEF", "234", "567"}).collect(Collectors.toList()));
    list1.add(Arrays.stream(new String[] {"GHI", "345", "678"}).collect(Collectors.toList()));
    list2.add(Arrays.stream(new String[] {"ABC", "789", "012"}).collect(Collectors.toList()));
    list2.add(Arrays.stream(new String[] {"DEF", "890", "123"}).collect(Collectors.toList()));
    list2.add(Arrays.stream(new String[] {"GHI", "901", "234"}).collect(Collectors.toList()));
}

Now, depending on how much checking you have to do and if you know that both streams have entries with matching keys (really a map would be better here than two Collections), as well as if you can skip a lot of error checks and so-on:

private static Stream<List<String>> combineTwoStreams(Collection<List<String>> list1,
        Collection<List<String>> list2) {
    Map<String, List<String>> newCollection = new LinkedHashMap<>();

    list1.stream()
    // Run through each entry in list1
            .forEach(row -> {
                // Find the entries in list2 that have the same 'key' and add them to newCollection
                    list2.stream().filter(row2 -> row.get(0).equals(row2.get(0))).forEach(row2 -> {
                        String key = row.get(0);

                        // Pull from newCollection
                            List<String> fromNewCollection = newCollection.get(key);
                            // If there wasn't an entry in newCollection for the key, add one now and pre-populate
                            // with list1's data
                            if (Objects.isNull(fromNewCollection)) {
                                fromNewCollection = new LinkedList<>(row);
                                newCollection.put(key, fromNewCollection);
                            }

                            // Add list2's data to the new collection (note that we can do an addAll instead
                            row2.subList(1, row2.size()).forEach(fromNewCollection::add);
                        });
                });

    // Return a Stream of our values
    return newCollection.values().stream();
}

Now, if you have to also check list2's stuff, you probably want this:

private static Stream<List<String>> combineTwoStreams(Collection<List<String>> list1, Collection<List<String>> list2) {
    Map<String, List<String>> newCollection = new LinkedHashMap<>();

    list1.forEach(row -> newCollection.put(row.get(0), row));
    list2.forEach(row -> {
        String key = row.get(0);

        // Pull from newCollection
        List<String> fromNewCollection = newCollection.get(key);
        // If there wasn't an entry in newCollection for the key, add one now and pre-populate
        // with list1's data
        if (Objects.isNull(fromNewCollection)) {
            fromNewCollection = new LinkedList<>(row);
            newCollection.put(key, fromNewCollection);
        } else
            // Add list2's data to the new collection (note that we can do an addAll instead
            row.subList(1, row.size()).forEach(fromNewCollection::add);
    });

    // Return a Stream of our values
    return newCollection.values().stream();
}

If you want to prevent duplicates, a LinkedHashSet is great, so you might have newCollection be Map<String, Set<String>> newCollection = new LinkedHashMap<>(); and then you would simply make LinkedHashSets.

And lastly, if you like to make yourself cross-eyed:

private static Stream<List<Object>> combineTwoStreams(Collection<List<String>> list1, Collection<List<String>> list2) {
    Map<String, List<String>> map1 = // Turn our first list into map map
    list1.stream().collect(Collectors.toMap(row1 -> row1.get(0), row1 -> row1));
    Map<String, List<String>> map2 = // Turn our second list into a map
    list2.stream().collect(Collectors.toMap(row2 -> row2.get(0), row2 -> row2.subList(1, row2.size())));

    return Stream
            .of(map1, map2)
            // Give me the EntrySets as a stream
            .flatMap(m -> m.entrySet().stream())
            // Collect it all down
            .collect(
            // Group together things by the "key" so that we have a Map<String,List<List<String>>>
                    Collectors.groupingBy(
                            Map.Entry::getKey,
                            Collectors.mapping(Map.Entry::getValue,
                            // Merge the lists within each list down so that we have
                            // Map<String,List<String>>
                                    Collectors.collectingAndThen(
                                            Collectors.toList(),
                                            row -> row.stream().flatMap(Collection::stream)
                                                    .collect(Collectors.toList())))))
            // Get the values from our Map
            .values()
            // Return the stream of List<String>
            .stream();
}
DanO
  • 193
  • 7
  • I'm sorry but I have to downvote. I cannot encourage the use of `.forEach`. – Tunaki Apr 19 '16 at 18:09
  • I was *trying* to stick to stream or iterator functions as much as possible. There are much better methods than this, but figured that the user wanted to use mostly stream or iterator functions. – DanO Apr 19 '16 at 18:18
  • But that's the matter. `forEach` should not be encouraged when you can do the same thing with a proper reduction process, which is parallel friendly, unlike `forEach`. `forEach` is the bridge between the functional programming of the Stream API and the classic imperative programming. If you use that, might as well drop the Stream API and use good-old `for` loops. And I'm not saying using the classic imperative programming is bad, what I'm saying is that mixing the two concepts in the same code is bad. – Tunaki Apr 19 '16 at 18:20
  • You cannot safely reduce this without interim steps or a forEach. I could if I just did the OP's "if I cannot do the preferred output, I'll accept this", but would rather not give the subpar solution. Also, there's nothing inherently evil or bad about forEach and he has to iterate (which is a for-each operation) over both sets anyway. The first solution shows a filter. We could filter in the second, etc, but it was, again, to show more stream/iterator operations. If you have better, I'd love to see something that doesn't foreach or contain interim steps -- would be good to learn from. – DanO Apr 19 '16 at 18:24
  • Yes, yes you can :). But the logic is different. See my comment [here](http://stackoverflow.com/questions/36724305/streams-merge-2-lists/36726210?noredirect=1#comment61035551_36724305) or the even better solution of zipping the two Streams together (as proposed by the linked question). The morale is: with the Stream API, you want to use a reduction. If what you want to do cannot comply with that, then don't use the Stream API. (and this is not my morale, although I agree with it, this is [documentation itself](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html)) – Tunaki Apr 19 '16 at 18:26
  • Yes, it just occurred to me before you responded that you *could* map each and merge the maps. I'll add that as the preferred option to this post. – DanO Apr 19 '16 at 18:30
  • There. Honestly, the third one is so overly complex that I find it would be hard to maintain, hard to explain, and it is, for me, anathema. But, since you're hell-bent on hating on forEach... there it is. – DanO Apr 19 '16 at 19:56
  • (and I know why you didn't do it yourself...) – DanO Apr 19 '16 at 19:56
  • This is not how I would have done it. I would have done it like this:https://ideone.com/tQJrti (your version is indeed quite complicated) – Tunaki Apr 19 '16 at 19:59
  • And I insist, It is not me who is hell-bent with forEach, it is in the [doc itself](https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html): *Many computations where one might be tempted to use side effects **can be more safely and efficiently expressed without side-effects, such as using reduction instead of mutable accumulators**. However, side-effects such as using println() for debugging purposes are usually harmless. A small number of stream operations, such as forEach() and peek(), can operate only via side-effects; **these should be used with care**.* – Tunaki Apr 19 '16 at 20:02
  • I don't know... I shudder -- all are really horrible and a standard for-each is considerably better and cleaner for this task. But, if the exercise is to use stream functions, the OP now has four examples of streams (three of mine and yours). Just because we can use mostly stream, doesn't mean that we always should. – DanO Apr 19 '16 at 20:31
  • Also, yes, forEach can have side-effects, *but* the ways it was used above are devoid any side effects. It's like saying, "don't use memcpy"... no... memcpy isn't evil, but you do have to know how to use it. Down-voting just because you don't like a function is not good (if it were truly "bad" Oracle wouldn't have put it in). – DanO Apr 19 '16 at 20:31
  • Oh, and I was trying to avoid anonymous blocks in my last example, which really made it complicated. A method reference would have made things nice. – DanO Apr 19 '16 at 20:36
  • Well, there are certainly side-effects in your solution, `newCollection` (just to cite one). It's not your fault, `forEach` can only operate via side-effects. And I don't find the linked ideone code horrible, but arguably, this can come down to preference. What doesn't come down to preference is mixing the two concepts: your first examples can be written with a good `for` (that's my point) without crossing functional / imperative approach. And the docs make that clear. – Tunaki Apr 19 '16 at 20:38
  • From the start I stated that I could have used a Collect and done an addAll, which, according to the docs, would have prevented any harmful side effects with multithreading (although we weren't using // streams anyway and those are discouraged unless you use your own Executor). As for the extra list, that is just much cleaner than an (expensive) anonymous like you did or the ugly mess that I did to avoid it. Really, this task could benefit from method references and/or for-each. – DanO Apr 19 '16 at 20:49