3

I have the following data:

uuid    id1 id2 hId hName       percent golden
1       J   K   a   fetchflow   38%     34%
2       J   K   b   codelibs1   45%     34%
3       J   K   c   codelibs2   97%     34%
10      K   L   a   fetchflow   16%     10%
11      K   L   b   codelibs1   95%     10%
12      K   L   c   codelibs2   12%     10%
13      K   M   a   fetchflow   64%     14%
14      K   M   b   codelibs1   53%     14%
15      K   M   c   codelibs2   48%     14%

And want to get to this:

Compare To  Golden  a   b   c
J       K   34%     38% 45% 97%
K       L   10%     16% 95% 12%
K       M   14%     64% 53% 48%

Note: Pair(id1, id2) == Pair(id2, id1), so they're interchangeable.

I want to store it in the following java datastructure:

class Foo {
    int id1;
    int id2;
    double golden;
    /*
        [a -> 0.38,
        b -> 0.45,
        c -> 0.97]
    */
    Map<Integer, Double> comparisons;
}

I currently have the follwing code, but I can't map it to the datastructure that I want:

comparisons
        .stream()
        .collect(
                groupingBy(
                        Function.identity(),
                        () -> new TreeMap<>(
                                Comparator.<ComparisonResultSet, Integer>comparing(o -> o.vacancy_id_1).thenComparing(o -> o.vacancy_id_2)
                        ),
                        collectingAndThen(
                                reducing((o, o2) -> o), Optional::get
                        )
                ));
haisi
  • 1,035
  • 1
  • 10
  • 30
  • 2
    Your comparator doesn’t make `id1` and `id2` interchangeable. – Holger Nov 11 '16 at 09:44
  • 1: Yeah, I noticed that. I will probably use an interchangeable tuple for id1 and id2. 2: `new Foo(J, K, 0.34, [a=0.38; b=0.45; c=0.97]`; `new Foo(K, L, 0.1, [a=0.16; b=0.95; c=0.12]`; `new Foo(K, M, 0.14, [a=0.64; b=0.53; c=0.48]` – haisi Nov 11 '16 at 09:50

2 Answers2

4

One solution, or rather starting point, would be

List<Foo> result = list.stream().collect(Collectors.collectingAndThen(
    Collectors.groupingBy(
            o -> Arrays.asList(o.vacancy_id_1, o.vacancy_id_2),
            Collectors.toMap(o -> o.hId, o -> Arrays.asList(o.percent, o.golden))),
    m -> m.entrySet().stream().map(e -> new Foo(
            e.getKey().get(0), e.getKey().get(1),
            e.getValue().values().stream().mapToDouble(l->l.get(1))
                    .reduce((a,b)->{assert a==b; return a; }).getAsDouble(),
            e.getValue().entrySet().stream()
                    .collect(Collectors.toMap(Map.Entry::getKey, en->en.getValue().get(0)))
    )).collect(Collectors.toList())
));

which only uses standard Collection classes, which complicates matters. It groups by Arrays.asList(o.vacancy_id_1, o.vacancy_id_2), which implies an ordering of the IDs. You could wrap it with new HashSet<>(…) to get an order-independent key, however, that complicates the solution, when it comes to the construction of the Foo instances, as a dedicated id1 and id2 is required. I.e.

List<Foo> result = list.stream().collect(Collectors.collectingAndThen(
    Collectors.groupingBy(
            o -> new HashSet<>(Arrays.asList(o.vacancy_id_1, o.vacancy_id_2)),
            Collectors.toMap(o -> o.hId, o -> Arrays.asList(o.percent, o.golden))),
    m -> m.entrySet().stream().map(e -> {
        Iterator<Integer> it = e.getKey().iterator();
        return new Foo(
            it.next(), it.next(),
            e.getValue().values().stream().mapToDouble(l->l.get(1))
                    .reduce((a,b)->{assert a==b; return a; }).getAsDouble(),
            e.getValue().entrySet().stream()
                    .collect(Collectors.toMap(Map.Entry::getKey, en->en.getValue().get(0)))
        );
    }).collect(Collectors.toList())
));

Note that new HashSet<>(Arrays.asList(o.vacancy_id_1, o.vacancy_id_2)) could be replaced by Set.of(o.vacancy_id_1, o.vacancy_id_2) in Java 9.

A dedicated order-independent pair type would make the operation simpler, especially, when you replace the two id properties by a single property of that type in both, source and result type, right from the start.

Another obstacle is the “golden” property. Without it, the downstream collector would be Collectors.toMap(o -> o.hId, o -> o.percent), producing exactly the desired map for the Foo result. Since we have to carry another property here, the map needs a subsequent conversion step, after the “golden” property has been reduce to a single value.

Using a pair class like

public final class UnorderedPair<T> {
    public final T a, b;

    public UnorderedPair(T a, T b) {
        this.a = a;
        this.b = b;
    }
    public int hashCode() {
        return a.hashCode()+b.hashCode()+UnorderedPair.class.hashCode();
    }
    public boolean equals(Object obj) {
        if(this == obj) return true;
        if(!(obj instanceof UnorderedPair)) return false;
        final UnorderedPair<?> other = (UnorderedPair<?>) obj;
        return a.equals(other.a) && b.equals(other.b)
            || a.equals(other.b) && b.equals(other.a);
    }
}

and the pairing collector from this answer, we get

List<Foo> result = list.stream().collect(Collectors.collectingAndThen(
    Collectors.groupingBy(
        o -> new UnorderedPair<>(o.vacancy_id_1, o.vacancy_id_2),
            pairing(
                Collectors.toMap(o -> o.hId, o -> o.percent),
                Collectors.reducing(null, o -> o.golden,
                    (a,b) -> {assert a==null || a.doubleValue()==b; return b; }),
            (m,golden) -> new AbstractMap.SimpleImmutableEntry<>(m,golden))),
    m -> m.entrySet().stream().map(e -> new Foo(
        e.getKey().a, e.getKey().b, e.getValue().getValue(), e.getValue().getKey()))
    .collect(Collectors.toList())
));

but, as said, having a single property of the unordered pair type in source and result would simplify the task much more.

Community
  • 1
  • 1
Holger
  • 285,553
  • 42
  • 434
  • 765
  • 1
    Attention: At least IntelliJ isn't capable of inferring the correct types in the second to last line `.map(e -> ....` but the code still compiles and runs properly. – haisi Nov 11 '16 at 12:27
0

I took into consideration that id1 and id2 and golden are the same, id1 and id2 being interchangeable.

How about this:

list.stream().collect(Collectors.collectingAndThen(Collectors.groupingBy(struct -> {
        String first = struct.getId1();
        String second = struct.getId2();

        if (first.compareTo(second) > 0) {
            return ImmutableList.of(first, second, struct.getGolden());
        }
        return ImmutableList.of(second, first, struct.getGolden());

    }, Collectors.toMap(Structure::getHId, Structure::getPercentage)),
            elem -> elem.entrySet().stream().map(entry -> {
                ImmutableList<?> values = entry.getKey();
                return new Foo((String) values.get(0), (String) values.get(1), (Integer) values.get(2),
                        entry.getValue());
            }).collect(Collectors.toList())));

That interchangeable key makes things a bit ugly.

Eugene
  • 117,005
  • 15
  • 201
  • 306