1

I have something like below written as an "old-school" version where I am trying to collect and group similarities in a given List. I would like to transfer this solution into, I think better, functional way with grouping or something more java 8 stream way.

Actually I would like to get rid of initialization for List<List<T>> listOfLists = new LinkedList<List<T>>(); and passing of this instance to getMatchedList method.

Here is my current solution:

import java.util.Arrays;
import java.util.LinkedList;
import java.util.List;
import java.util.Optional;

public class HelloWorld {

    public static void main(String... args) {

        List<Integer> originalList = Arrays.asList(1, 2, 4, 5, 7);

        List<List<Integer>> listOfLists = new LinkedList<>();
        originalList.stream()
                .forEach(item -> {
                    Optional<List<Integer>> list = getMatchedList(item, listOfLists);
                    if (list.isPresent()) {
                        list.get().add(item);
                    } else {
                        listOfLists.add(new LinkedList<>(Arrays.asList(item)));
                    }
                });
        System.out.println(listOfLists);
    }

    static Optional<List<Integer>> getMatchedList(Integer item, List<List<Integer>> list) {
        return list.stream().filter(a -> matched(a, item)).findAny();
    }

    static boolean matched(List<Integer> list, Integer b) {
        return list.stream().filter(x -> isSimilar(x, b)).findAny().isPresent();
    }

    static boolean isSimilar(Integer a, Integer b) {
        return Math.abs(a - b) <= 1; // true or false based on additional logic
    }

}

Let's say isSimilar function is Math.abs(a - b) <= 1. As a result I would like to have list of lists as follow:

[
   [1,2],
   [4,5],
   [7]
]

Note: For example if we have 1,2,3 we should have list: [1,2,3] despite of that 1 and 3 are not similar. But they are "connected" with common similarity which is 2.

Note II: The question is not about the numbers. Numbers are just for simplicity. Actually I have custom objects not Integers and I also have different isSimilar function. The goal is to achieve this "grouping" to clusters in java-8 functional way.

VladoDemcak
  • 4,893
  • 4
  • 35
  • 42
  • 3
    What if *a* is similar to *b* and *b* is similar to *c* but *a* is not similar to *c*? – shmosel Aug 07 '17 at 16:01
  • @shmosel if we have `1,2,3` where `1` is `a`, `2` is `b` and `3` is `c` based on your note we should get one list as follow: `[1,2,3]` – VladoDemcak Aug 07 '17 at 16:02
  • 4
    Does your code do that? – shmosel Aug 07 '17 at 16:03
  • I think it does ... I updated code b/c i had mistakes in getMatchedList function. Code should work I added working sample. – VladoDemcak Aug 07 '17 at 16:18
  • But 1 and 3 are not similar, so how can they be in the same set? – Andreas Aug 07 '17 at 16:20
  • You are right `1` and `3` are not similar but they are "connected" with `2`. List is something like cluster. – VladoDemcak Aug 07 '17 at 16:23
  • It sounds like for every given position in the original list ```x``` you want to create a list of any other number ```y``` in the original list as long as the original list contains all numbers in the range ```[min(x,y) ... max(x,y)]```. If so please state this clearly in your question; currently your explanation is cryptic at best but I think is actually erroneous (e.g. I think you meant ```Math.abs(x - y)``` rather than ```Math.abs(x, y)```. – Valentin Ruano Aug 07 '17 at 17:19
  • Actually a simpler way to state the above, you want to create list of maximal "runs" of consecutive numbers. – Valentin Ruano Aug 07 '17 at 17:22
  • I don't think the most efficient solution would be based on Streams at least not fully. Best is to sort the numbers (perhaps using a stream if you like) and then just iterate once across the sorted list find out where the discontinuities are dumping the run right before the discontinuity into the output list. – Valentin Ruano Aug 07 '17 at 17:25
  • @ValentinRuano yes, you are right. Actually I had correct `Math.abs(a - b)` in the implementation but not in the description. BTW it's not about the numbers. Numbers are just for simplicity. Actually I have custom objects not `Integers` and I also have different `isSimilar` function. The goal is to achieve this "grouping" to clusters in `java-8` functional way. – VladoDemcak Aug 07 '17 at 17:26
  • @VladoDemcak.... so we can NOT assume that isSimilar is transitive correct? only that is reflexible and reciprocal? – Valentin Ruano Aug 07 '17 at 17:28
  • @ValentinRuano `isSimilar` is some function which returns `boolean` based on relationship we have between two items. This function has to define measurable relationship between two items. – VladoDemcak Aug 07 '17 at 17:34
  • 3
    Your result relies on the order, i.e. if the input is `1, 2, 4, 5, 7, 3, 6`, you don’t get `[[1, 2, 3, 4, 5, 6, 7]]`, but rather `[[1, 2, 3], [4, 5, 6], [7]]`. Stream operation are usually not suitable for such order dependent operations or require additional effort to be correct. Besides, I recommend reading [When to use LinkedList over ArrayList?](https://stackoverflow.com/q/322715/2711488)… – Holger Aug 07 '17 at 17:48
  • 1
    If ```isSimilar``` is not a transitive function, meaning that ```isSimilar(x, y) && isSimilar(y,z)``` does not imply ```isSimilar(x, z)``` then would be rather difficult to get a decent implementation using Streams. If ```isSimilar``` is transitive, the you could use ```reduce``` to keep adding elements to a set of similarity/"equivalence" groups. – Valentin Ruano Aug 07 '17 at 18:44
  • Streams are appropriate when each element can be processed independently. But that isn't the case here; the handling of each element depends on the content of the whole stream. So, don't use streams for this class of problem. – erickson Aug 07 '17 at 23:56

1 Answers1

2

I understand what you want to do in OP: get rid of initialization for List<List<T>> listOfLists = new LinkedList<List<T>>()... but don't know why. Here is my answer, maybe is not exactly what you want. First I would like improve a bit of readability from my personal aspect:

public void test_45551110() {
    List<Integer> originalList = Arrays.asList(1, 2, 4, 5, 7);

    List<List<Integer>> listOfLists = new ArrayList<>();
    originalList.stream().forEach(item -> getMatchedList(item, listOfLists).add(item));
    System.out.println(listOfLists);
}

static List<Integer> getMatchedList(Integer item, List<List<Integer>> listOfLists) {
    return listOfLists.stream().filter(a -> matched(a, item)).findAny().orElseGet(() -> {
        List<Integer> list = new ArrayList<>();
        listOfLists.add(list);
        return list;
    });
}

Secondly, if the original elements in the list can be sorted based on some property. Your concerns can be easily resolved by StreamEx

List<List<Integer>> res = StreamEx.of(Arrays.asList(1, 2, 5, 7, 3))
        .sorted().collapse((a, b) -> isSimilar(a, b), Collectors.toList()).toList();
System.out.print(res); // output: [[1, 2, 3], [5], [7]]

I think most of people may be concern the time complexity n ^ 2 more than getting rid of initialization for List<List<T>> listOfLists = new LinkedList<List<T>>(). if you could share the real data structure and logic in isSimilar, we may be able to figure out the real/better solution for your question.

123-xyz
  • 619
  • 4
  • 5