Keyed Collection in Python?

Question

Is there any equivalent to KeyedCollection in Python, i.e. a set where the elements have (or dynamically generate) their own keys?

i.e. the goal here is to avoid storing the key in two places, and therefore dictionaries are less than ideal (hence the question).

Could you say more about why you don't want to have keys and values? Python will generally just reference the same object as a key and a value, so you aren't incurring twice the memory. — Ned Batchelder, Jul 21 '11 at 22:50
@Ned: Because semantically, it doesn't make as much sense. When an object *knows* its key, it doesn't make sense to put it in a dictionary -- it's *not* a key-value pair. It's more of a semantic issue than anything else. (There's also the issue that `KeyedCollection` can be **indexed** by an ordinal as well, but I don't need that here specifically. It's useful in other situations, though.) — user541686, Jul 21 '11 at 22:53

Gabi Purcaru · Answer 1 · 2011-07-21T18:23:00.713

4

You can simulate that very easily:

class KeyedObject(object):
    def get_key(self):
        raise NotImplementedError("You must subclass this before you can use it.")

class KeyedDict(dict):
    def append(self, obj):
        self[obj.get_key()] = obj

Now you can use a KeyedDict instead of dict with subclasses of KeyedObject (where get_key return a valid key based on some object property).

edited Jul 21 '11 at 18:23

answered Jul 21 '11 at 18:14

Gabi Purcaru

30,940
9
79
95

1

@Mehrdad: this doesn't store the key twice, it stores two *references* to the key. If you have a large string as a key, the string only exists in memory once. – Ned Batchelder Jul 21 '11 at 22:49
@Ned: Yeah I know it's storing a reference, but it's storing the reference twice. Not only do I not like duplication, but more importantly, see my comment on your other comment. – user541686 Jul 21 '11 at 22:55
@Mehrdad: Memory is usually plentiful (until it's not). Which would your users prefer: a program that works quickly because it uses data structures appropriate to the task at hand, or a program that works less quickly but doesn't have unnecessary duplication in its internal data structures? – kindall Jul 22 '11 at 00:16
@kindall: It's not a memory issue -- it's a readability and semantics issue. If I have to store something twice, then I have to keep track of it twice, and that gets that much harder to read. Being worried about memory here is a bit overkill; that's not a concern. – user541686 Jul 22 '11 at 00:43

kindall · Answer 2 · 2011-07-22T00:18:40.587

Given your constraints, everyone trying to implement what you're looking for using a dict is barking up the wrong tree. Instead, you should write a list subclass that overrides __getitem__ to provide the behavior you want. I've written it so it tries to get the desired item by index first, then falls back to searching for the item by the key attribute of the contained objects. (This could be a property if the object needs to determine this dynamically.)

There's no way to avoid a linear search if you don't want to duplicate something somewhere; I am sure the C# implementation does exactly the same thing if you don't allow it to use a dictionary to store the keys.

class KeyedCollection(list):
     def __getitem__(self, key):
         if isinstance(key, int) or isinstance(key, slice):
             return list.__getitem__(key)
         for item in self:
             if getattr(item, "key", 0) == key:
                 return item
         raise KeyError('item with key `%s` not found' % key)

You would probably also want to override __contains__ in a similar manner so you could say if "key" in kc.... If you want to make it even more like a dict, you could also implement keys() and so on. They will be equally inefficient, but you will have an API like a dict, that also works like a list.

Thanks for the info! Though I could implement it by hand, I was wondering if there's something built-in so that I wouldn't have to define/include the class in every project I need it for. But anyway, this is definitely a solution; +1 for the help. :) — user541686, Jul 22 '11 at 00:46

Paul · Answer 3 · 2011-07-21T18:13:00.693

1

I'm not much of a C#'er, but I think dictionaries is what you need.

http://docs.python.org/tutorial/datastructures.html#dictionaries

http://docs.python.org/tutorial/datastructures.html

Or maybe lists:

http://docs.python.org/library/functions.html#list

edited Jul 21 '11 at 18:13

answered Jul 21 '11 at 18:07

Paul

20,883
7
57
74

Ned Batchelder · Accepted Answer · 2011-07-21T23:19:20.117

@Mehrdad said:

Because semantically, it doesn't make as much sense. When an object knows its key, it doesn't make sense to put it in a dictionary -- it's not a key-value pair. It's more of a semantic issue than anything else.

With this constraint, there is nothing in Python that does what you want. I suggest you use a dict and not worry about this level of detail on the semantics. @Gabi Purcaru's answer shows how you can create an object with the interface you want. Why get bothered about how it's working internally?

It could be that C#'s KeyedCollection is doing the same thing under the covers: asking the object for its key and then storing the key for fast access. In fact, from the docs:

By default, the KeyedCollection(Of TKey, TItem) includes a lookup dictionary that you can obtain with the Dictionary property. When an item is added to the KeyedCollection(Of TKey, TItem), the item's key is extracted once and saved in the lookup dictionary for faster searches. This behavior is overridden by specifying a dictionary creation threshold when you create the KeyedCollection(Of TKey, TItem). The lookup dictionary is created the first time the number of elements exceeds that threshold. If you specify –1 as the threshold, the lookup dictionary is never created.

It does, but (1) it can be disabled, and (2) it still allows access by ordinal. — user541686, Jul 21 '11 at 23:17
@Mehrdad: make up your mind. You said you didn't care about access by ordinal. — Ned Batchelder, Jul 21 '11 at 23:21
I said don't care about it for **this** case, but I do care about it in general (i.e. most other times when I need a KeyedCollection). But notice point (1) as well. — user541686, Jul 21 '11 at 23:27
OK, well, we're building up a Python implementation of KeyedCollection, we can add in indexed lookup too, but keep in mind: that will require storing another reference to the value. — Ned Batchelder, Jul 21 '11 at 23:38

Fabien Letort · Answer 5 · 2017-12-14T16:37:24.410

To go a little more in detail that the already correct answer from @Gabi Purcaru's answer, here a class that do the same as gabi one's but that also check for correct given type on key and value (as the TKey and TValue of the .net KeyedCollection).

class KeyedCollection(MutableMapping):
    """
    Provides the abstract base class for a collection (:class:`MutableMappinp`) whose keys are embedded in the values.
    """
    __metaclass__ = abc.ABCMeta
    _dict = None  # type: dict

    def __init__(self, seq={}):
        self._dict = dict(seq)

    @abc.abstractmethod
    def __is_type_key_correct__(self, key):
        """
        Returns: The type of keys in the collection
        """
        pass

    @abc.abstractmethod
    def __is_type_value_correct__(self, value):
        """
        Returns: The type of values in the collection
        """
        pass

    @abc.abstractmethod
    def get_key_for_item(self, value):
        """
        When implemented in a derivated class, extracts the key from the specified element.
        Args:
            value: the element from which to extract the key (of type specified by :meth:`type_value`)

        Returns: The key of specified element (of type specified by :meth:`type_key`)
        """
        pass

    def __assert_type_key(self, key, arg_name='key'):
        if not self.__is_type_key_correct__(key) :
            raise ValueError("{} type is not correct".format(arg_name))

    def __assert_type_value(self, value, arg_name='value'):
        if not self.__is_type_value_correct__(value) :
            raise ValueError("{} type is not correct".format(arg_name))

    def add(self, value):
        """
        Adds an object to the KeyedCollection.
        Args:
            value: The object to be added to the KeyedCollection (of type specified by :meth:`type_value`).
        """
        key = self.get_key_for_item(value)
        self._dict[key] = value

    # Implements abstract method __setitem__ from MutableMapping parent class
    def __setitem__(self, key, value):
        self.__assert_type_key(key)
        self.__assert_type_value(value)
        if value.get_key() != key:
            raise ValueError("provided key does not correspond to the given KeyedObject value")
        self._dict[key] = value

    # Implements abstract method __delitem__ from MutableMapping parent class
    def __delitem__(self, key):
        self.__assert_type_key(key)
        self._dict.pop(key)

    # Implements abstract method __getitem__ from MutableMapping parent class (Mapping base class)
    def __getitem__(self, key):
        self.__assert_type_key(key)
        return self._dict[key]

    # Implements abstract method __len__ from MutableMapping parent class (Sized mixin on Mapping base class)
    def __len__(self):
        return len(self._dict)

    # Implements abstract method __iter__ from MutableMapping parent class (Iterable mixin on Mapping base class)
    def __iter__(self):
        return iter(self._dict)
        pass

    # Implements abstract method __contains__ from MutableMapping parent class (Container mixin on Mapping base class)
    def __contains__(self, x):
        self.__assert_type_key(x, 'x')
        return x in self._dict

score 0 · Answer 6 · answered Jul 21 '11 at 18:43

Why not simply use a dict? If the key already exists, a reference to the key will be used in the dict; it won't be senselessly duplicated.

class MyExample(object):
    def __init__(self, key, value):
        self.key = key
        self.value = value

m = MyExample("foo", "bar")
d = {}

d[m.key] = m

first_key = d.keys()[0]
first_key is m.key  # returns True

If the key doesn't already exist, a copy of it will be saved, but I don't see that as a problem.

def lame_hash(s):
    h = 0
    for ch in s:
        h ^= ord(ch)
    return h

d = {}
d[lame_hash(m.key)] = m
print d  # key value is 102 which is stored in the dict

lame_hash(m.key) in d  # returns True

score 0 · Answer 7 · answered Jul 21 '11 at 21:35

0

I'm not sure if this is what you meant, but this dictionary will create it's own keys as you add to it...

class KeyedCollection(dict):
    def __init__(self):
        self.current_key = 0
    def add(self, item):
        self[self.current_key] = item

abc = KeyedCollection()
abc.add('bob')
abc.add('jane')
>>> abc
{0: 'bob', 1: 'jane'}

answered Jul 21 '11 at 21:35

sampwing

1,238
1
10
13

I'm trying to avoid storing the key twice, which this doesn't do. :\ – user541686 Jul 21 '11 at 21:38

score 0 · Answer 8 · answered Jul 22 '11 at 03:56

0

How about a set()? The elements can have their own k

answered Jul 22 '11 at 03:56

Zzz

1

Keyed Collection in Python?

8 Answers8