-2

I have a structure in mind that is similar to a dict with tuples as keys, except you can look up entries with only one of the tuple elements.

e.g. (something similar to this. not real Python code, just an idea)

>>> d[(100, "apple")] = 5.0 # putting entry into dict
>>> d[(100, "pear")] = 10.0 # putting entry into dict
>>> d[(200, "pear")] = 10.0 # putting entry into dict
>>> d[100] # O(1) lookup
[("apple", 5.0), ("pear", 10.0)]
>>> d["pear"] # O(1) lookup
[(100, 10.0), (200, 10.0)]

You can't currently do this with a defaultdict(). What is the best way to do this in Python, or the best data structure to use? I'd like the lookup to be O(1), like it is for a dict.

In this case, neither tuple element will be unique, nor will the values.

I am considering:

  • Nested dicts
  • Two dicts?
  • Some database-like structure
Lily
  • 1
  • 4
  • 1
    Your second line is not (just) a lookup, it is also putting entry into the dict. Presumably you also want `d[200] = "pear", 10.0` to create a `(200, "pear")` to `10.0` mapping? What do you want a true lookup, such as `d[100]`, to return? How about `d[(100, "apple")]`? – user4815162342 Dec 02 '14 at 09:01
  • Will all of the elements in your triple be unique (no `100`, `'apple'` or `5.0` anywhere else)? – jonrsharpe Dec 02 '14 at 09:04
  • What should your lookup return if you have multiple elements for 100 (e.g. d[(100, "apple")] = 5.0 and d[(100, "pear")] = 6.0)? Do you expect to get a list of tuples? – Frank Schmitt Dec 02 '14 at 09:06
  • @user4815162342 The above code is incorrect because Python dicts don't handle that. I edited it a little to make it look less like I'm putting something in the dict in the second line. d[100] is supposed to return something like ("apple", 5.0). d[(100, "apple") will return 5.0, like a true dict. – Lily Dec 02 '14 at 09:08
  • I think you didn't get the problem right... You break a naturally key. Just add a few duplicate `100` entries and show what should happen. Because right now you have just a minimalistic example without the interesting cases... – wenzul Dec 02 '14 at 09:09
  • @jonrsharpe Good question. Yes, in this case, all the numbers will be unique so there wouldn't be entries like d[(100, "pear")] also in the dict. The strings and values won't be unique, though. So there could be d[(200, "apple")] = 5.0 – Lily Dec 02 '14 at 09:10
  • So whats's the problem just using an `int` as key if the number will be unique? – wenzul Dec 02 '14 at 09:12
  • @wenzul, exactly... just store list of tuples as values and call it a day imho – user3012759 Dec 02 '14 at 09:13
  • 1
    What is the use case? How are you going to use this, practically? Perhaps there is a different (better) way to do whatever you want to do. –  Dec 02 '14 at 09:18
  • Thanks for the feedback, guys. I'm afraid I was mistaken earlier. Neither of the tuple elements are unique. So there can be multiple entries with (100, ...) keys and looking up should return ALL of them. And then when you search through those results for the string you need, you get the value. Editing original question – Lily Dec 02 '14 at 09:25
  • @Evert Best way to explain use case is like a database lookup. Looking up one of the elements of the tuple will result in every associated dict entry. – Lily Dec 02 '14 at 09:30
  • Here we go, now we can imaging what you will achieve. You could use a solution from the answers. If this will be a huge dictionary you could also think about using a database e.g. [sqlite](https://docs.python.org/2/library/sqlite3.html) in memory. – wenzul Dec 02 '14 at 09:32
  • The data is huge, yes, so I am looking into sqlite at the moment. Thank you! – Lily Dec 02 '14 at 09:41
  • Ok, so may I have the chance to be the right most downvoted answer. :P – wenzul Dec 02 '14 at 09:46
  • A huge amount of data and some structured database-style data type: you may want to look into using [numpy's record arrays](http://docs.scipy.org/doc/numpy/user/basics.rec.html), or probably better, use [Pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html) with its Series and DataFrames. –  Dec 02 '14 at 10:01
  • Could you provide me a good example of how sqlite can do this? Would be a great starting place, thanks – Lily Dec 02 '14 at 10:02
  • I have updated my answer with an `sqlite3` example. But it's really more complex than a dictionary lookup. I don't know Pandas or similar but I think more Python database-style structures will fit better for your problem. – wenzul Dec 02 '14 at 11:42

3 Answers3

1

Consider something like:

class MyDict(object):

    def __init__(self):
        self._data = {}

    def __setitem__(self, key, val):
        self._data[key] = val
        if key[0] not in self._data:
            self._data[key[0]] = {}
        self._data[key[0]][key[1]] = val

    def __getitem__(self, key):
        return self._data[key]

This allows lookup by the tuple or its first element, both in O(1). You can then implement the same for key[1]. Using a dictionary of dictionaries makes subsequent lookup of the other part of the key O(1) too. In use:

>>> d = MyDict()
>>> d[100, "apple"] = 5.0
>>> d[100, "pear"] = 10.0
>>> print d[100]
{'pear': 10.0, 'apple': 5.0}
>>> print d._data
{(100, 'apple'): 5.0, (100, 'pear'): 10.0, 100: {'pear': 10.0, 'apple': 5.0}}

Note that this assumes that each combination key[0], key[1] will only have a single val.

See e.g. How to "perfectly" override a dict? on making custom dictionaries.

Community
  • 1
  • 1
jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
0

There are two common possibilities.

  • For smaller dictionaries: Overwrite __getitem__ and __setitem__ of dict to implement your requirements. There are several questions at SO about that.
  • For larger dictionaries: Using a database where you have a usable query language. For example: sqlite in memory but it depends on persistence and API requirements.

To answer your question for a sqlite3 example

import sqlite3

con = sqlite3.connect(":memory:") # could also be a file
con.isolation_level = None
cur = con.cursor()

# Create table
cur.execute("CREATE TABLE foo (key INTEGER, fruit TEXT, value REAL)")
# Create indexes for fast lookup
cur.execute("CREATE INDEX index_key ON foo (key)")
cur.execute("CREATE INDEX index_fruit ON foo (fruit)")

def insert(key, fruit, value):
    cur.execute("INSERT INTO foo VALUES (?, ?, ?)", (key, fruit, value))

insert(100, "apple", 5.0)
insert(100, "pear", 10.0)
insert(200, "pear", 10.0)

print cur.execute("SELECT * FROM foo WHERE key=?", (100,)).fetchall()
print cur.execute("SELECT * FROM foo WHERE fruit=?", ("pear",)).fetchall()

[(100, u'apple', 5.0), (100, u'pear', 10.0)]
[(100, u'pear', 10.0), (200, u'pear', 10.0)]

You could also use an object relational mapper or another database like key-value store which may matches better for your needs.

Using a database is truely not faster than a dictionary lookup. But you have some advantages like persistance and a query API. I would not optimize for speed to early if databases will match your needs. It's up to you.

wenzul
  • 3,948
  • 2
  • 21
  • 33
  • I'm not the one who downvoted you, but this is false. The hash makes finding the candidates O(1), but you still have to make the comparison - which is O(n) for the size of the tuple - since hashes can collide. You can verify this by yourself with a tuple of sufficient length, eg. `t1 = tuple(range(10000000)); t2 = tuple(range(10000000)); d = {t1: 'value'}; d[t2]` and seeing that the lookup isn't instantaneous. – Aleksi Torhamo Dec 02 '14 at 09:39
  • @AleksiTorhamo Thank you for your comment. Are you really sure about that? I think the time delay depends on the hash generation of the key... – wenzul Dec 02 '14 at 09:44
  • Yes, I know it must perform the comparison, since otherwise it would be broken. You can verify this by creating a class that always returns the same hash, but verifies equality based on some attribute, and then using instances of the class as dictionary keys. The keys will return the correct associated value even though they all have the same hash. You can also verify that the time delay does not depend on the hash generation, by eg. just trying it again, since hash values are cached. Or by other tests, which I won't explain in detail, since I'm running out of characters. – Aleksi Torhamo Dec 02 '14 at 09:56
  • Hm, yes sure there is kind of backlog list in hashlists which handles collisions. But for the regular case it should be O(1) I mean. – wenzul Dec 02 '14 at 10:05
  • No, you have to do the comparison even without collisions. With a class having a constant hash, if you save an instance with one value and query with an instance with another value, it will see that the hashes are the same, perform the comparison, and give you a KeyError. The comparison is *always* performed. Since you're probably not using *huge* tuples as keys, this will likely not matter at all, but the fact still stands that the comparison is O(n) with relation to the tuples length, not O(1). If you use a humongous tuple, it's going to take time. – Aleksi Torhamo Dec 02 '14 at 10:16
  • I agree that there is at least a comparision for checking if the calculated hash is a valid entry in the hash list or not. May we should devide in best case O(1) and worst case O(n) if all items are collide. Thanks for your obstinacy. :) – wenzul Dec 02 '14 at 10:22
  • I googled that. Maybe also look into [this](https://wiki.python.org/moin/TimeComplexity) entry. – wenzul Dec 02 '14 at 10:25
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/66021/discussion-between-aleksi-torhamo-and-wenzul). – Aleksi Torhamo Dec 02 '14 at 10:42
  • There is also a topic on [SO](https://stackoverflow.com/questions/1963507/time-complexity-of-accessing-a-python-dict). – wenzul Dec 02 '14 at 10:43
-1

Why don't you use namedtuple like

>>>>from collections import namedtuple  
>>>>Fruit = namedtuple("Fruit", ["name", "count"])  
>>>>f1 = Fruit(name="apple", count=100)  
>>>>f2 = Fruit(name="pear", count=100)  
>>>>print f1  
Fruit(name='apple', count=100)  
>>>>print f2  
Fruit(name='pear', count=100)  
>>>>f1.name  
'apple'  
>>>>f1.count
100
>>>>f2.name  
'pear'  

Now use the fruitcount dictionary as,

>>>>fruitcount = {f1: 5.0}
>>>>fruitcount = {f2: 10.0}
>>>>fruitcount[f1]
5.0
>>>>fruitcount[f2]
10.0

Refer here for more info: python-docs

kmario23
  • 57,311
  • 13
  • 161
  • 150