4

I am maintaining a dictionary that keeps track of the similarities between pairs of objects.
For example, this dictionary could look like this:

similarities = {
 p1: {p2: v12, p3:v13, p4:v14},
 p2: {p1: v21, p3:v23, p4:v24},
 p3: {p1: v31, p2:v32, p4:v34},
 p4: {p1: v41, p2:v42, p4:v43}
}

Note, that the similarity measurement is symmetric. Therefore, similarities[p1][p2] is the same as similarities[p2][p1] i.e. v12 == v21.

Sometimes, I'll need to eliminate p2 from similarities[p1]; and in doing so, I'll need to remove p1 and p2 from all the inner dictionaries in similarities as well.
This is tedious and inefficient.

So instead of maintaining a symmetric dictionary, is there a way to maintain a dictionary with a composite key so that I can lookup similarities[p1,p2]?

I can't really use a tuple since (p1, p2) != (p2, p1) and I can't know a priori how to order the tuple.

A frozenset is the only other container that I can think of, but that won't cut it since there may still be other keys in similarities that contain either p1 or p2 as a component. So what container could I use to solve this issue?

Technical info:

  • python 2.7
  • there will always be exactly 2 elements in this "composite key"

Thank you

inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • 2
    `frozenset` seems like it would solve your problem to me -- Any reason to think that there might be something better? – mgilson Feb 13 '13 at 01:51
  • Can you use a syntax like `similarities[p1, p2]`? – Blender Feb 13 '13 at 01:52
  • @Blender: no that's illegal syntax and can at best be translated into a tuple – inspectorG4dget Feb 13 '13 at 01:52
  • @mgilson: I've actually explained why `frozenset` doesn't work (it was an edit shortly after the initial post) – inspectorG4dget Feb 13 '13 at 01:53
  • 2
    And what's wrong with frozenset here? – wim Feb 13 '13 at 01:53
  • @wim: I've explained why `frozenset` won't work. If you could be more specific about why that explanation doesn't work for you, I could explain it further – inspectorG4dget Feb 13 '13 at 01:55
  • 1
    Maybe this interests you: http://stackoverflow.com/q/2572916/674039 – wim Feb 13 '13 at 01:56
  • 2
    "since there may still be other keys in similarities that contain either p1 or p2 as a component" - If you mean that you'll have both `frozenset({p1, p2})` and `frozenset({p1, p3})`, I don't see how this is a problem. Otherwise I guess I don't understand your explanation. – Danica Feb 13 '13 at 01:56
  • @Dougal: if I remove `p2` from `similarities[p1]`, I must iterate do `similarities.pop(p1)`; and `similarities[p].pop(p2)` and `similarities.pop(p1)` for all `p` in similarities. Does that clarify? – inspectorG4dget Feb 13 '13 at 01:59
  • Does this do what you're trying to do (though of course in linear time, which is far from ideal) for the frozenset case? `similarities = {k: v for k, v in similarities.viewitems() if p2 not in k}` – Danica Feb 13 '13 at 02:01
  • @Dougal: Almost: `similarities = {k: v for k, v in similarities.viewitems() if p2 not in k and p1 not in k}` – inspectorG4dget Feb 13 '13 at 02:02
  • (edited) Oh, so you're trying to delete anything that has `p1` _or_ `p2` - missed that. I don't see any non-crazy dict-based options avoiding a linear-time pass on that. – Danica Feb 13 '13 at 02:03
  • @Dougal: that leaves `frozenset({p1,p3})` in `similarities` which should also be burninated – inspectorG4dget Feb 13 '13 at 02:04
  • @wim: it comes close, but misses out on the second of my requirements – inspectorG4dget Feb 13 '13 at 02:10

3 Answers3

2

I'd probably just use a frozenset, assuming the objects are hashable.

Alternatively, if they have any well-defined and consistent order on them, you could keep them in a tuple sorted by said order. You could write a little dict subclass to do that for you transparently if you wanted.

Or, you could do something like this:

class SymmetricDict(dict):
    def __getitem__(self, key):
        if key in self:
            return dict.__getitem__(self, key)
        a, b = key
        return dict.__getitem__(self, (b, a))

and similarly for __setitem__.

Danica
  • 28,423
  • 6
  • 90
  • 122
  • I think you'll probably need `dict.__getitem__(self,key)` instead of `self[key]` to avoid an infinite loop – mgilson Feb 13 '13 at 01:56
  • @mgilson Of course, I meant to write that and then my fingers just typed the wrong thing. :) Fixed. – Danica Feb 13 '13 at 01:57
  • To `super` or not to `super` ... That is the question ... ;-). (Personally, I tend to not use `super`) – mgilson Feb 13 '13 at 01:59
1

I think using frozenset is the only logical solution. You can find keys that match just one of the values using a comprehension with a set intersection test:

def remove_ab(ab, similarities):
    return {k:v for k, v in similarities.items() if not ab & k}

similarities = {frozenset({1, 2}): "v12",
                frozenset({1, 3}): "v13",
                frozenset({2, 3}): "v23",
                frozenset({3, 4}): "v34"}

similarities = remove_ab(frozenset({1, 2}), similarities
print(similarities) # output is {frozenset({3, 4}): 'v34'}
Blckknght
  • 100,903
  • 11
  • 120
  • 169
0

If the p_ objects are of a type that supports sorting, could you use a tuple with the two elements always in lo --> hi order?

Chris Johnson
  • 20,650
  • 6
  • 81
  • 80