4

What I have so far is largely based off page 571 of "Introduction To Algorithms" by Cormen et al.

I have a Node class in python that represents a set:

class Node:
    def __init__(self, parent, rank = 0):
        self.parent = parent
        self.rank = rank

This implementation is going to use a List of Nodes as the forest (I am open to better ways to store the sets).

Initialize() returns a list of Nodes, that I will store in variable set and pass into the other functions.

Find searches through the forest for the value and returns the set that it appears in. I chose to use for s in range(len(set)): so that in the recursion I could shrink the list of sets being passed in by set[s:].

def Find(set, value):
    for s in range(len(set)):
        if value != set[s].parent:
            set[s].parent = Find(set[s:], set[s].parent)
        return set[s]

Merge merges sets in the forest by finding them and promoting the higher ranked set.

def Merge(set, value1, value2):
    set_val_1 = Find(set, value1)
    set_val_2 = Find(set, value2)
    if set_val_1.rank > set_val_2.rank:
        set_val_2.parent = set_val_1
    else:
        set_val_1.parent = set_val_2
        if set_val_1.rank == set_val_2.rank:
            set_val_2.rank += 1

I am getting some errors when I do Finds and Merges, namely Find is not returning the proper set, and so I am not sure if Merge is working properly as well. I would appreciate some help to make sure the functions are implemented properly.

tshepang
  • 12,111
  • 21
  • 91
  • 136
Jordan
  • 4,928
  • 4
  • 26
  • 39

5 Answers5

3

Have you looked at any other existing implementations?

Brandon
  • 3,684
  • 1
  • 18
  • 25
2

I don't have the latest edition of the book, but this doesn't look quite like a disjoint-set forest.

I think your mistake is to think that the forest has to be stored in a collection and that you have to traverse this collection to do the operations on the nodes. Remove set from Merge() and Find() and implement Find() as

def Find(n):
    if n != n.parent:
        n.parent = Find(n.parent)
    return n.parent

just like in the book. Then add a MakeSet() that returns a single correctly initialized node, and maybe a SameSet() function too:

def SameSet(n1, n2):
    return Find(n1) == Find(n2)

You now have a working disjoint set implementation.

antonakos
  • 8,261
  • 2
  • 31
  • 34
  • In a real-world "working disjoint set implementation", there'd be a facility to iterate over all the sets, a facility to return all the members of the set that includes a given node, and the Node class would have some kind of "payload" attribute. – John Machin Feb 23 '11 at 21:38
  • @John By "working" I just mean that it works. Brandon links to two implementations, including one from a Python cookbook. Neither of these store payloads nor do they support iteration over the sets or retrieval of all members of a set. But they are still useful for the typical use-case of keeping track of connected components. – antonakos Feb 23 '11 at 22:39
  • nodes as defined by the OP have only a parent and a rank; what components does this keep track of??? The "one from a Python cookbook" implements the payload in the sample code (see "label"), and the mention of itertools.groupby suggests that it might be doing something useful. In Eppstein's code, the "hashable objects" are the "payload". – John Machin Feb 24 '11 at 00:44
  • @John My reading of Brandon's examples was a little sloppy, sorry. The nodes *can* carry a payload as in the Recipe demo, and Epstein's implementation (which wasn't from a real book, my mistake!) *does* allow iteration through all keys. But the disjoint set is about connectedness, not values. Store the set within the value, not the other way around. Then you can ask if two values are connected or join two values together. That's what is being kept track of. – antonakos Feb 24 '11 at 01:35
  • I looked over my implementation and played around with a few things, the basic issue I was having was that I was comparing results of `Find` operations which return `Node` objects. Really I needed to return `Node.parent` which are ints (in my case) and actually make sense to compare (as opposed to the references of `Node` objects. Bottom line you get the accept. – Jordan Feb 28 '11 at 22:09
1

Presuming that each node is initialised to be its own parent:

def Find(node):
    while node is not node.parent:
        node = node.parent
    return node
John Machin
  • 81,303
  • 11
  • 141
  • 189
  • @John Machin: How might I iterate through my entire collection of sets to find which set node is in? – Jordan Feb 23 '11 at 22:33
  • @Yoel: If you mean the set leader: `leader = Find(node)`. If you mean a list of fellow member nodes: `leader = Find(node); return [x for x in list_of_nodes if Find(x) is leader]` – John Machin Feb 24 '11 at 00:51
  • Beware that this does not do path compression. – antonakos Feb 24 '11 at 01:36
  • @John Machin: +1 for being the most helpful so far. What I meant by the collection of sets is, the list that contains all of the sets in my problem i.e. what `Initialize` returns. – Jordan Feb 27 '11 at 23:35
  • @Yoel: Thanks for the +1. I know what you mean by the collection of sets; I called it "list_of_nodes". My question was: what ANSWER do you require to your "which set node is in" query? The target set's representative/leader node, or a container-load of the nodes in the target set? – John Machin Feb 28 '11 at 00:30
  • @John Machin: I need the representative/leader node. – Jordan Feb 28 '11 at 02:51
  • @Yoel: We seem to be going around in a loop. The whole point of my first comment was that you DON'T need to iterate through the collection to get the representative node of the set that z belongs to; it is simply `Find(z)`; that is the whole purpose of the Find() function. – John Machin Feb 28 '11 at 05:27
  • @John Machin: Just wanted to say thank you for your help, ultimately I needed to tweek some things that @antonakos suggested but you were also really helpful. – Jordan Mar 01 '11 at 00:20
0

Using this implementation as a starting point, I've created a new python class to handle disjoint sets, which also supports persistence using a MongoDb.

With my class you should be able, for example, to:

  • Create an instance of UnionFind(), which uses python build-in dictionaries by default, and decide later to consolidate() your results in a MongoDb collection
  • Create an instance of UnionFind from a MongoDb collection and directly use it.

You might want to check the code on github.

Cheers, Simone

simonemainardi
  • 505
  • 3
  • 7
0

The Wikipedia page provides pseudocode for the basic operations on disjoint-set data structure. Here's a direct port to Python (employs path compression and union by rank):

class Node:
    """Represents an element of a set."""
    def __init__(self, id):
        self.id = id
        self.parent = self
        self.rank = 0
        self.size = 1

    def __repr__(self):
        return 'Node({!r})'.format(self.id)


def Find(x):
    """Returns the representative object of the set containing x."""
    if x.parent is not x:
        x.parent = Find(x.parent)
    return x.parent


def Union(x, y):
    """Combines the sets x and y belong to."""
    xroot = Find(x)
    yroot = Find(y)

    # x and y are already in the same set
    if xroot is yroot:
        return

    # x and y are not in same set, so we merge them
    if xroot.rank < yroot.rank:
        xroot, yroot = yroot, xroot  # swap xroot and yroot

    # merge yroot into xroot
    yroot.parent = xroot
    xroot.size += yroot.size
    if xroot.rank == yroot.rank:
        xroot.rank = xroot.rank + 1

Demo:

>>> a, b, c = map(Node, 'abc')
>>> Find(a), Find(b), Find(c)
(Node('a'), Node('b'), Node('c'))
>>> Find(a).size
1
>>> Union(a, b)
>>> Union(b, c)
>>> Find(a), Find(b), Find(c)
(Node('a'), Node('a'), Node('a'))
>>> Find(a).size
3
Eugene Yarmash
  • 142,882
  • 41
  • 325
  • 378