Python set comparison

Question

I have two sets, each containing numerous tuples:

s1 = set([('a','b','c'), ('d','e','f'), ('g','h','i'), ('j','k','l'), ('m','n','o')])
s2 = set([('a','y','z'), ('p','q','r'), ('s','t','u'), ('v','w','x')])

Each tuple contains a number of strings(in this case 3). Also each tuple has an id which is the first element. I want to check which tuple has the same id in both sets but different following values like (a,b,c) in s1 and (a,y,z) in s2 and output this.

Do you have to have the exact tuple to check if it's in the set using in and how do you then access this tuple to print it out?

*Also each tuple has an id which is the first element.* Sound to me like you really wanted to use `dict`s instead.. — Martijn Pieters, Jul 12 '12 at 09:15

score 2 · Answer 1 · answered Jul 12 '12 at 09:16

Do you have to have the exact tuple to check if it's in the set using in

Yes, you do. If only part of your items should be compared, then use dict, make the comparable parts the keys and the rest the values.

Starting from your example code,

d1 = dict((x[0], x) for x in s1)
# similarly, make d2 from s2

Then you can check for a in d1, grab the associated triple with d1[a], etc.

score 1 · Answer 2 · answered Jul 12 '12 at 09:22

I think converting your sets into dicts would ease the search:

>>> d1 = {t1[0]: (t1[1], t1[2]) for t1 in s1}
>>> d1
{'a': ('b', 'c'), 'j': ('k', 'l'), 'm': ('n', 'o'), 'd': ('e', 'f'), 'g': ('h', 'i')}
>>> d2 = {t2[0]: (t2[1], t2[2]) for t2 in s2}
>>> d2
{'a': ('y', 'z'), 'p': ('q', 'r'), 's': ('t', 'u'), 'v': ('w', 'x')}
>>> [(k2, d2[k2]) for k2 in d2 if k2 in d1 and d2[k2] != d1[k2]]
[('a', ('y', 'z'))]

score 0 · Answer 3 · answered Jul 12 '12 at 09:17

0

Convert your set into a dict, using the id as key of the dict:

s1d = {}
for e in s1:
  s1d[e[0]] = e

Then it should be easy.

answered Jul 12 '12 at 09:17

daniel kullmann

13,653
8
51
67

score 0 · Answer 4 · answered Jul 12 '12 at 09:19

0

Something like this can work:

for x in s1:
   templist = [y for y in s2 if y[0] == x[0] and x[1:] != y[1:]]
   if len(templist):
      print x,

answered Jul 12 '12 at 09:19

SexyBeast

7,913
28
108
196

1

If speed is an issue this will not do since it is `O(n^2)` – jamylak Jul 12 '12 at 09:24
Yeah, but all the methods which employ hashes are kind of similar, aren't they? The sets have to be converted into hashes, then a key has to be searched in another hash, ultimately a search for each key in another hash has to be implemented somehow. So can O(n^2) be avoided? – SexyBeast Jul 12 '12 at 09:30
The implementation is constant that isn't considered in the **growth**. I think you are talking about the time. – jamylak Jul 12 '12 at 09:33
I didn't quite get you. We are to take each element of one set, check to find whether it is there in the other one, if so, print it out. Since we are dealing with set, the elements are guaranteed to be unique, so there is no further scope of re-using the history to skip the search of some of the subsequent elements. So it has to be a O(n^2) search one way or the other, how can it be avoided? – SexyBeast Jul 12 '12 at 09:36
Having duplicates is not an issue when you simply want to check whether something exists in a set or not. In my solution I only check the keys in the intersection which is an average case `O(N)` operation since set average lookup time is `O(1)`, how is that `O(N^2)`? – jamylak Jul 12 '12 at 09:47
How is the intersection implemented internally? – SexyBeast Jul 12 '12 at 09:49
I don't know since I haven't looked into it but there is a link to the source code here: http://wiki.python.org/moin/TimeComplexity#set – jamylak Jul 12 '12 at 10:05
Yeah, see, the worst case scenario is O(len(set1)*len(set2)), which is just O(n^2) when the lengths of the sets are equal... – SexyBeast Jul 12 '12 at 10:09
The worst case senario will almost never occur. Read the comments here: http://stackoverflow.com/questions/8102478/intersection-complexity – jamylak Jul 12 '12 at 10:14
Ok, so in that case, I will simply change the order of the sets, putting the longer set in the inner loop, that will make the two same, right? – SexyBeast Jul 12 '12 at 10:18
I don't understand what you mean, make which two the same? – jamylak Jul 12 '12 at 10:19
By the way, that hardly makes an improvement. Either way, you are iterating over an outer array, while searching for that element in the inner array through a **in** search. m*n should be same as n*m, no? – SexyBeast Jul 12 '12 at 10:20
I was telling that the page you pointed out says that the implementation keeps the shorter array in the outer loop and the longer one inside. So I said that that can be simulated here by iterating over s2 outside and s1 inside. – SexyBeast Jul 12 '12 at 10:22
How can a hash-structure simplify the searching process? At best, it can reduce memory-usage, since multiple hashes may share a common key for different values (at least in Perl), but how can that hasten value lookup? – SexyBeast Jul 12 '12 at 10:32

jamylak · Answer 5 · 2012-07-12T09:27:38.123

As @MartijnPieters said you should be using dicts.

Here is the method by @larsmans as a solution.

>>> s1 = set([('a','b','c'), ('d','e','f'), ('g','h','i'), ('j','k','l'), ('m','n','o')])
>>> s2 = set([('a','y','z'), ('p','q','r'), ('s','t','u'), ('v','w','x')])
>>> d1 = dict((i, (a, b)) for i, a, b in s1)
>>> d2 = dict((i, (a, b)) for i, a, b in s2)
>>> [(k, d1[k], d2[k]) for k in set(d1).intersection(d2) if d1[k] != d2[k]]
[('a', ('b', 'c'), ('y', 'z'))]

score 0 · Answer 6 · answered Jul 12 '12 at 09:20

You want to use a dict instead of a set to store the items by their id:

def byid(tups):
    for t in tups:
        # yield key, value paris
        yield t[0], tuple(t[1:])

# make a dict from the pairs
seen = dict(byid(s1))

for key, vals in byid(s2):
    # now it's easy to find duplicates
    if key in seen:
        print seen[key], "vs", vals

Python set comparison

6 Answers6