How to pair elements in two different sets?

Question

If there are two sets -

set1 - [tag, boLD, Link]
set2 - [BOLd, TAG, Badge, foo]

What could be the efficient algorithm for making pairs of elements like -

pairs = [tag, TAG], [boLD, BOLd], [Link, null], [null, Badge], [null, foo]

Notice the pairing is on the basis of case-insensitive names.

I want to avoid O(N^2), that looks up all elements in set1 iteratively, and look in the element in set2.

EDIT: I think if we can use Ternary Search Tries, to make symbol table implementation, where keys are elements from set1, and values from set2. set2 remaining elements could be dealt at last.

Why don't you tell us what you thought up so far? The good news is: O(N^2) is your worst case upper boundary ;-) — GhostCat, Apr 02 '15 at 12:21
as Sets does not support index access So unfortunately you need to lookup all elements !!! — Neeraj Jain, Apr 02 '15 at 12:23
Think of what other type of collection/data structure might help you out here. You'll benefit more from playing around with this stuff yourself than getting the answer — Steve, Apr 02 '15 at 12:24
I was thinking of implementing symbol table using Ternary search tries. — unknown_boundaries, Apr 02 '15 at 12:27
Why use a ternary search trie? What's wrong with a HashMap containing the lowercased-string as key and the string as value? — JB Nizet, Apr 02 '15 at 12:38
http://stackoverflow.com/questions/15985266/how-to-iterate-through-two-arraylists-simultaneously Should help. — underwater ranged weapon, Apr 02 '15 at 12:52

Grisha Weintraub · Accepted Answer · 2015-04-02T13:03:17.597

You can do it in O(n), if you use some data structure that supports O(1) get operations - for example HashMap.

    HashMap<String, String> set1 = new HashMap<>();
    HashMap<String, String> set2 = new HashMap<>();
    class Pair{
        String str1;
        String str2;

        Pair(String s1, String s2){
            str1 = s1;
            str2 = s2;
        }
    }
    Set <Pair> pairs = new HashSet<>();
    set1.put("tag", "tag");
    set1.put("bold", "boLD");
    set1.put("link", "Link");
    set2.put("tag", "TAG");
    set2.put("bold", "BOLd");
    set2.put("badge", "Badge");
    set2.put("foo", "foo");

    for (String s : set1.keySet()){
        if (set2.containsKey(s))
            pairs.add(new Pair(set1.get(s), set2.get(s)));
        else
            pairs.add(new Pair(set1.get(s), null));
    }

    for (String s : set2.keySet()){
        if (!set1.containsKey(s))
            pairs.add(new Pair(null, set2.get(s)));
    }

    for(Pair p : pairs)
        System.out.println(p.str1 + " " + p.str2);

Firstly I would not just print it. I need a data structure to return, and how can multiple null entries be present as keys in any data structure. — unknown_boundaries, Apr 02 '15 at 12:51
Instead of printing, you can insert the values to some data structure. See my edited post. — Grisha Weintraub, Apr 02 '15 at 13:03
Converting to lower case is more practicable than writing all possible permutations of case identities. — Martin Krämer, Apr 02 '15 at 23:06

score 1 · Answer 2 · answered Apr 02 '15 at 12:37

1

You could just create a HashMap and iterate over both arrays. For the key, you may use the lowercase representation of each string. This should give you linear runtime complexity and is very easy to implement.

answered Apr 02 '15 at 12:37

timcbaoth

669
1
7
12

score 0 · Answer 3 · answered Apr 02 '15 at 12:48

A simple solution (given in Python, too lazy to write Java now) which is linear in the lengths of the lists would be the following:

map1 = {}
for i,e in enumerate(set1):
    s = e.lower()
    map1[s] = i

map2 = {}
for i,e in enumerate(set2):
    s = e.lower()
    map2[s] = i

pairs = []
for i,e in enumerate(set1):
    s = e.lower()
    if s in map2:
        elem = (e, set2[map2[s]])
    else:
        elem = (e, None)
    pairs.append(elem)

for i,e in enumerate(set2):
    s = e.lower()
    if s not in map1:
        pairs.append((None, e))

I'm also assuming there are no duplicates in the input, i.e. either it matches 0 or 1 words from the other set and there is not the same word with different cases in each of the sets, as I'm not sure what you would expect your output to be then.

It's probably not the most efficient way to go about it, but seems fine overall as it's still O(n).

How to pair elements in two different sets?

3 Answers3