13

Given a one-to-one dictionary (=bijection) generated à la

for key, value in someGenerator:
     myDict[key] = value

an inverse lookup dictionary can be trivially created by adding

    invDict[value] = key

to the for loop. But is this a Pythonic way? Should I instead write a class Bijection(dict) which manages this inverted dictionary in addition and provides a second lookup function? Or does such a structure (or a similar one) already exist?

Tobias Kienzler
  • 25,759
  • 22
  • 127
  • 221
  • 5
    What about this [bidict](https://pypi.python.org/pypi/bidict) – Jon Clements May 28 '13 at 13:38
  • @JonClements Sounds perfect, thanks! I'd accept that as answer. Using slices for the inverse lookup is a great idea there – Tobias Kienzler May 28 '13 at 13:40
  • 1
    By the looks of it, `bidict` just wraps two separate Python dictionaries with forward and reverse mappings, so it's no more efficient than doing the same yourself. Indeed, if you're doing lots of key lookups, it'll be much slower due to the function call overhead. – Aya May 28 '13 at 13:46
  • 1
    It's hardly an answer and I have no experience in actually using it to personally recommend it either. Just to note though, if I remember correctly, there was some quite heavy debate about whether the API of that library was Pythonic or not in using slice notation. But, beauty in the eye of the beholder and all that. – Jon Clements May 28 '13 at 13:46
  • @Aya I looked at the source, it does so indeed. (see [here](https://bitbucket.org/jab/bidict/src/8a2eb8d16607395954175c389ed4c7b1721c074e/bidict.py?at=default#cl-437)), but for my purposes that is probably negligible. But one could use the `namedbidict` in that case – Tobias Kienzler May 28 '13 at 14:00
  • @JonClements I'm fine with that notation - hashes are not supposed to be ordered, so there is no way a slice could be misinterpreted unless the keys or values are explicitly integers... Though that's probably one reason to use the `namedbidict` instead – Tobias Kienzler May 28 '13 at 14:03
  • @TobiasKienzler Yep. Looks like `namedbidict` would be more efficient. – Aya May 28 '13 at 14:10
  • @Aya Though replacing bidict's `__getitem__` with a `try: return self._fwd[keyorslice]\n except TypeError: return self._bwd[keyorslice.stop]` (ignoring incorrect usage) might speed things up if required. (Dammit, I want linebreaks in comments back) – Tobias Kienzler May 28 '13 at 14:16
  • 1
    @TobiasKienzler Indeed. The try/except block will slow things down as well. In practice, I found `return d[k] if k in d else something_else` much faster than both `try: return d[k]\n except KeyError: return something_else` and `return d.get(k, something_else)`, i.e. two key lookups is faster than allowing Python to generate an exception and/or calling an instance method. Personally, I don't think it provides enough additional functionality (compared to just using two dicts) to warrant its performance issues, but if that's a non-issue, then it at least has the virtue of being easy to use. – Aya May 28 '13 at 14:31
  • @Aya I thought `try/except` isn't that terrible in Python? Anyway, [namedbidict can't be faster](https://bitbucket.org/jab/bidict/src/8a2eb8d16607395954175c389ed4c7b1721c074e/bidict.py?at=default#cl-589), so if speed is an issue one must either write another class which doesn't overload `__getitem__` but provides e.g. a `getKey(value)` or directly use a dict and its inverse, as you suggested – Tobias Kienzler May 28 '13 at 14:41
  • 2
    @TobiasKienzler Well, I recently had to optimize a fairly complex recursive function which was being called 150,000 times, so I spent quite a while profiling different methods with `cProfile`, and managed to save a lot of CPU time by using the `d[k] if k in d` approach. I later changed it to use `d = collections.defaultdict(lambda: None); v = d[k]; return something_else if v is None else v` which was even better, since it only requires one key lookup, key misses only ever happen once, and `defaultdict` is implemented in C. You're right about the `namedbidict` - I misread it the first time. – Aya May 28 '13 at 14:55
  • Assuming there's no overlap between key/value types, another option would be something like: `class Bidict(dict): def __setitem__(self, k, v): dict.__setitem__(self, k, v); dict.__setitem__(self, v, k)` i.e. use the same dict for both directions, but that may not be appropriate in your case. – Aya May 28 '13 at 15:10
  • @Aya Subclass `MutableMapping` instead of `dict`. http://stackoverflow.com/questions/3387691/python-how-to-perfectly-override-a-dict – gotgenes May 28 '13 at 15:48
  • @gotgenes I don't see how that addresses the performance issue associated with calling `__getitem__`. – Aya May 28 '13 at 15:55
  • @Aya Thanks for finding the true dupe, I only found [this one](http://stackoverflow.com/questions/1456373/two-way-reverse-map) which is restricted to your mentioned special case of no key/value collisions. Incidentally, [this question](http://stackoverflow.com/questions/12527724/build-a-dictionary-for-find-key-by-value) should be opened and reclosed as a dupe of your suggestion instead of the one I mentioned – Tobias Kienzler May 29 '13 at 07:02

2 Answers2

6

What I've done in the past is created a reversedict function, which would take a dict and return the opposite mapping, either values to keys if I knew it was one-to-one (throwing exceptions on seeing the same value twice), or values to lists of keys if it wasn't. That way, instead of having to construct two dicts at the same time each time I wanted the inverse look-up, I could create my dicts as normal and just call the generic reversedict function at the end.

However, it seems that the bidict solution that Jon mentioned in the comments is probably the better one. (My reversedict function seems to be his bidict's ~ operator).

Claudiu
  • 224,032
  • 165
  • 485
  • 680
  • bidict is perfect for my purposes - it basically stores the inverse dictionary inside, but improves access by using the slice operator – Tobias Kienzler May 28 '13 at 13:57
0

if you want O(log(n)) time for accessing values, you will need both a representation of the map and a representation of the inverse map.

otherwise the best you can do is O(log(n)) in one direction and O(n) in the other.

Edit: not O(log(n)), thanks Claudiu, but you are still going to need two data structures to implement the quick access times. And this will be more or less the same space as a dict and an inverse dict.

Zackkenyon
  • 352
  • 1
  • 2
  • 18