Seeking Optimization Suggestions for Map-to-Map Conversion

Question

I'm seeking feedback as to whether there's a more efficient approach than what I'm doing in my code shown at the bottom.

Basically, given this map:

        Set<String> A_Set = new HashSet<>(Arrays.asList("1111", "2222", "5555"));
        Set<String> B_Set = new HashSet<>(Arrays.asList("3333", "4444"));
        Set<String> C_Set = new HashSet<>(Arrays.asList("6666"));
        Set<String> D_Set = new HashSet<>(Arrays.asList("2222", "5555", "6666"));

        Map<String, Set<String>> values = new HashMap<>();
        values.put("A", A_Set);
        values.put("B", B_Set);
        values.put("C", C_Set);
        values.put("D", D_Set);

which looks like this:

How do I create a Map<String, List<Boolean> map such that it looks like this:

In the most efficient way possible. My real Map has thousands of values per Set, but there are only ever 4 Sets (A, B, C, D).

Here's my current code. Can you think of a more efficient approach?

import java.util.*;

public class MapToMap {

    public static void main(String[] args) {
        Set<String> A_Set = new HashSet<>(Arrays.asList("1111", "2222", "5555"));
        Set<String> B_Set = new HashSet<>(Arrays.asList("3333", "4444"));
        Set<String> C_Set = new HashSet<>(Arrays.asList("6666"));
        Set<String> D_Set = new HashSet<>(Arrays.asList("2222", "5555", "6666"));

        Map<String, Set<String>> values = new HashMap<>();
        values.put("A", A_Set);
        values.put("B", B_Set);
        values.put("C", C_Set);
        values.put("D", D_Set);

        Map<String, List<Boolean>> exists = new HashMap<>();

        for (Map.Entry<String, Set<String>> v : values.entrySet()) {
            for (String val : v.getValue()) {
                if (exists.containsKey(val)) {
                    List<Boolean> list = exists.get(val);
                    list = addValue(v.getKey(), list);
                    exists.put(val, list);
                } else {
                    List<Boolean> newList = new ArrayList<>(Arrays.asList(false, false, false, false));
                    newList = addValue(v.getKey(), newList);
                    exists.put(val, newList);
                }
            }
        }
        for (Map.Entry<String, List<Boolean>> s : exists.entrySet()) {
            System.out.println(s);
        }
    }

    private static List<Boolean> addValue(String key, List<Boolean> listToUse) {
        List<Boolean> newList = new ArrayList<>();
        if (Objects.equals("A", key)) {
            newList.addAll(Arrays.asList(true, listToUse.get(1), listToUse.get(2), listToUse.get(3)));
        } else if (Objects.equals("B", key)) {
            newList.addAll(Arrays.asList(listToUse.get(0), true, listToUse.get(2), listToUse.get(3)));
        } else if (Objects.equals("C", key)) {
            newList.addAll(Arrays.asList(listToUse.get(0), listToUse.get(1), true, listToUse.get(3)));
        } else if (Objects.equals("D", key)) {
            newList.addAll(Arrays.asList(listToUse.get(0), listToUse.get(1), listToUse.get(2), true));
        }
        return newList;
    }
}

If the code is working, then this belongs on [codereview.se] — Jim Garrison, May 10 '19 at 04:15
@Manh yes, the keys (A,B,C,D) are known before hand. The values of the sets (ex: 1111, 2222, etc.) are *not* known beforehand. — Casey B., May 10 '19 at 11:01

score 1 · Answer 1 · answered May 10 '19 at 04:38

1

Here's a solution using streams:

Map<String, List<Boolean>> exists = values.values()
        .stream()
        .flatMap(Set::stream)
        .distinct()
        .collect(Collectors.toMap(v -> v, v -> Stream.of("A", "B", "C", "D")
                .map(k -> values.get(k).contains(v))
                .collect(Collectors.toList())));

Ideone Demo

answered May 10 '19 at 04:38

shmosel

49,289
6
73
138

thank you but it appears Streams take twice as long (6 seconds vs. 3 seconds for non-streams) in my test here: https://ideone.com/2XZygo – Casey B. May 10 '19 at 11:28
Try parallel streams. – shmosel May 10 '19 at 15:03
I'm quite hesitant to implement parallelism given the expertise on the topic here: https://stackoverflow.com/a/23370799/4731712 advising: `But the reality is quite complicated. So until you achieve experthood, first identify when sequential processing is actually costing you something, and then measure if parallelism will help.` I don't want to prematurely optimize an algorithm that might not need it. I was seeking simple improvements on my code but I don't think there's much "easy" room for improvement, appreciate your time. – Casey B. May 10 '19 at 19:58
The whole question was how to optimize your code. How is this premature? – shmosel May 10 '19 at 20:02
Perhaps that was the wrong term, but given the expertise seemingly required to successfully `(1)` identify if parallelism is required and `(2)` ensuring its implemented appropriately is outside the scope of what I'm seeking. I'm not saying it won't help, but I can't justify the investment to find out. Especially given Brian's expertise advice quoted in the next comment. – Casey B. May 10 '19 at 20:07
`It is best to develop first using sequential execution and then apply parallelism where (A) you know that there's actually benefit to increased performance and (B) that it will actually deliver increased performance. (A) is a business problem, not a technical one. If you are a performance expert, you'll usually be able to look at the code and determine (B), but the smart path is to measure. (And, don't even bother until you're convinced of (A); if the code is fast enough, better to apply your brain cycles elsewhere.)` – Casey B. May 10 '19 at 20:07
I haven't accepted your current answer as it does not optimize my code, just FYI. Thanks regardless, you've put parallelism on my radar and helped me understand streams better. – Casey B. May 10 '19 at 20:10
It sounds to me like you're making exactly the type of mistake Brian is warning against, namely, making blanket assumptions without considering the particulars of a case. Let's look at his closing statement: *So until you achieve experthood, first identify when sequential processing is actually costing you something, and then measure if parallelism will help.* Is sequential processing costing you something? It certainly seems so. Did you measure if parallelism will help? If you do, and it does help, why would you not want to use it? This is practically a case study in when to try parallelism. – shmosel May 10 '19 at 20:20
I think you're exaggerating the level of expertise required to test parallelism. It's literally one line of code. The whole streams infrastructure was designed to provide parallelism out of the box, as long as you're doing everything by the book. It would be a shame to give up that power without a compelling reason. – shmosel May 10 '19 at 20:25
`Did you measure if parallelism will help? If you do, and it does help, why would you not want to use it?` I interpreted his point (A) above meaning don't optimize your code with parallel streams until you can test it in a business setting, as it's not a purely technical issue (one line of code added).I'd need to test on real servers, with real clients and load. I'm simply in developing mode and don't want to invest time into something that may not be an issue in reality. You're convincing me to .. – Casey B. May 10 '19 at 20:27
Not brush over the potential performance enhancements and invest more time to understand how it may help or hinder. If it so happens that I see an improvement then I'll come back and accept. – Casey B. May 10 '19 at 20:28
1

*don't optimize your code with parallel streams until you can test it in a business setting...* Fair point. I assumed your test roughly corresponded to your real-world needs. But if it doesn't, any potential solution carries a similar risk. Optimization is always a delicate process, whether or not it involves parallelism. – shmosel May 10 '19 at 20:31

Seeking Optimization Suggestions for Map-to-Map Conversion

1 Answers1