Initialising a disjoint set using a for-loop in a dict constructor in Python

Question

Firstly, I've mostly done very trivial things with python, so I am not very comfortable with pythonic syntax. I am learning about disjoint sets, and have a basic understanding of the implementation.

I came across a python forest-implementation (not mine!) , and am having a lot of difficulty understanding how the dictionary in __init__ works. The wikipedia example is simple enough to follow (this implementation is based on the wiki's pseudocode). Particulary, this line is very alien to me

  def __init__(self, vertices):
    self.struct = dict((v, [v, 0]) for v in vertices)

I cannot see what this dict constructor is doing, and my guess is that for every number from 0 to vertices, a v is inserted into the dictionary as both a key and a value, but the key has a subset containing v and an additional 0 to it? I might be very wrong.

Also, as I was comparing the python code

   xV = self.struct[xRoot]
    yV = self.struct[yRoot]
    if xV[1] < yV[1]:
      xV[0] = yRoot
    elif xV[1] > yV[1]:
      yV[0] = xRoot
    else:
      yV[0] = xRoot
      xV[1] += 1

to wikipedia's pseudocode,

 if xRoot.rank < yRoot.rank
     xRoot.parent := yRoot
 else if xRoot.rank > yRoot.rank
     yRoot.parent := xRoot
 else
     yRoot.parent := xRoot
     xRoot.rank := xRoot.rank + 1

I was trying to understand how the [0]'s and [1]'s are used. I can only make the connection that the [0] comes from how we initialised the dictionary struct. Of course, based on mere comparison, it is simple enough to know what they are referring to, but how is this done with the use of 0's and 1's in the struct?

Also, If you need the entire code of which I am talking about, you can find it here.

emvee · Answer 1 · 2016-02-20T22:31:59.477

0

def __init__(self, vertices):
    self.struct = dict((v, [v, 0]) for v in vertices)

What happens here is that a dict is being built where the vertex is the key and maps to a list with two elements, element[0] being the vertex itself (.parent in the pseudo code) and element[1] being the .rank of the vertex in the pseudocode.

So in stead of building a struct with named fields they're mapped to indices in a Python list.

The list that a vertex maps to (that's what the dict does) contains two elements, the first one being some kind of vertex, its type does not matter. The second element of the list is initialized with 0 (see the [v, 0] part of the code), an integer.

Using list indexing by integers, ...[1] addresses that specific integer. And integers can be used to represent the rank of the vertex, and can easily be modified: ...[1] += 1 adds 1 to the current rank.

edited Feb 20 '16 at 22:31

answered Feb 20 '16 at 22:08

emvee

4,371
23
23

Ok, tuple's are another thing I will have to look into. So if I am getting this straight, [v,0] represents a tuple, where 0 is the key indicating that the v in that tuple is the vertex? A brief skim over tuples tells me that they are immutable, so that is what I gathered. If so, how is there a [1] to use if we hadn't initialised a tuple containing 1's? – moomin Feb 20 '16 at 22:13
Did I say `tuple`? Yes I said that. But I meant `list`. `list` elements are mutable and `list`s are constructed using the `[ ... ]` syntax. Apologies and I'll edit. (Indexing into tuples and lists is done the same way, with numerical indices ...) – emvee Feb 20 '16 at 22:17
That's fine, thanks. I am still confused as to how they're using the `[1]` to represent the `.rank` though. – moomin Feb 20 '16 at 22:27

score 0 · Accepted Answer · edited May 23 '17 at 12:23

I wrote the python code, and I am happy to relay my thoughts on the topic.

The basic idea is that dot calls are slow in Python as demonstrated here. Problematically, the proposed structure on the wiki requires dot calls:

def makeset(x):
  x.parent = x
  x.rank = 0

But, we don't want to do dot calls so we can replace parent and rank with a list whose first position is the parent and second position is the rank. If we were converting an initialized object from makeset to a list, we would get:

x = [x.parent, x.rank]

Now, we could put each x in a list, but that would make each call to x O(n) in the worst case. So, we put each x into a dictionary indexed by its own name to get O(1) look up. This is equivalent to:

dictofsets = {}
for x in vertices:
  dictofsets[x] = [x, 0]

And, this can be shortcut to computing a dict from a list of key-value pairs (Note that the immutable tuple is the key-value pair for the dict not the list of [parent, rank].):

dictofsets = dict([(x, [x, 0]) for x in vertices])

Or, a dictionary comprehension if one wants to do it even faster (Note that I used Python 2.6 syntax for reasons that are not clear to me. Both are included below; though the literal from 2.7 should be preferred for reasons specified here. I changed my code on the blog to accommodate this find.).

2.6

dictofsets = dict((x, [x, 0]) for x in vertices)

2.7

dictofsets = {x: [x, 0] for x in vertices}

Thus, using the original naming convention, one accesses the dict as follows:

dictofsets[x][0] #parent
dictofsets[x][1] #rank

Hopefully that makes more sense!

If you have any other questions, give me a shout.

M

Initialising a disjoint set using a for-loop in a dict constructor in Python

2 Answers2