115

Suppose I have quantities of fruits of different colors, e.g., 24 blue bananas, 12 green apples, 0 blue strawberries and so on. I'd like to organize them in a data structure in Python that allows for easy selection and sorting. My idea was to put them into a dictionary with tuples as keys, e.g.,

{
    ('banana',    'blue' ): 24,
    ('apple',     'green'): 12,
    ('strawberry','blue' ): 0,
    # ...
}

or even dictionaries, e.g.,

{
    {'fruit': 'banana',    'color': 'blue' }: 24,
    {'fruit': 'apple',     'color': 'green'}: 12,
    {'fruit': 'strawberry','color': 'blue' }: 0,
    # ...
}

I'd like to retrieve a list of all blue fruit, or bananas of all colors, for example, or to sort this dictionary by the name of the fruit. Are there ways to do this in a clean way?

It might well be that dictionaries with tuples as keys are not the proper way to handle this situation.

All suggestions welcome!

Nico Schlömer
  • 53,797
  • 27
  • 201
  • 249
  • 31
    It sounds like you want a database... – Adam Rosenfield Feb 02 '11 at 19:28
  • 4
    You'd be best off defining a clsas to model this data, rather than trying to coordinate different collections of these values – Cuga Feb 02 '11 at 20:18
  • 2
    @AdamRosenfield maybe he's building one. – Prof. Falken Oct 26 '12 at 09:05
  • 2
    Just wanted to add that a dictionary is not hashable so the second syntax you where asking about is not possible because {'fruit': 'banana', 'color': 'blue' } which is a dictionary can not be used as a key for another dictionary. it would cause a TypeError: unhashable type: 'dict'. – epeleg Aug 02 '17 at 12:46

8 Answers8

158

Personally, one of the things I love about python is the tuple-dict combination. What you have here is effectively a 2d array (where x = fruit name and y = color), and I am generally a supporter of the dict of tuples for implementing 2d arrays, at least when something like numpy or a database isn't more appropriate. So in short, I think you've got a good approach.

Note that you can't use dicts as keys in a dict without doing some extra work, so that's not a very good solution.

That said, you should also consider namedtuple(). That way you could do this:

>>> from collections import namedtuple
>>> Fruit = namedtuple("Fruit", ["name", "color"])
>>> f = Fruit(name="banana", color="red")
>>> print f
Fruit(name='banana', color='red')
>>> f.name
'banana'
>>> f.color
'red'

Now you can use your fruitcount dict:

>>> fruitcount = {Fruit("banana", "red"):5}
>>> fruitcount[f]
5

Other tricks:

>>> fruits = fruitcount.keys()
>>> fruits.sort()
>>> print fruits
[Fruit(name='apple', color='green'), 
 Fruit(name='apple', color='red'), 
 Fruit(name='banana', color='blue'), 
 Fruit(name='strawberry', color='blue')]
>>> fruits.sort(key=lambda x:x.color)
>>> print fruits
[Fruit(name='banana', color='blue'), 
 Fruit(name='strawberry', color='blue'), 
 Fruit(name='apple', color='green'), 
 Fruit(name='apple', color='red')]

Echoing chmullig, to get a list of all colors of one fruit, you would have to filter the keys, i.e.

bananas = [fruit for fruit in fruits if fruit.name=='banana']
senderle
  • 145,869
  • 36
  • 209
  • 233
  • #senderle You wrote as a comment to another answer "But my gut feeling is that a database is overkill for the OP's needs; " ; So you prefer creating a namedtuple subclass. But what else are instances of classes if not micro-databases with their own tools to process their data ? – eyquem Feb 02 '11 at 22:06
  • Could I from those extract sublists with `name='banana'`? – Nico Schlömer Feb 03 '11 at 01:05
  • 2
    As chmullig pointed out, you would have to filter the keys, i.e. `bananas = filter(lambda fruit: fruit.name=='banana', fruits)` or `bananas = [fruit for fruit in fruits if fruit.name=='banana']`. This is one way in which nested dicts are potentially more efficient; it all comes down to the ways you plan to use the data. – senderle Feb 03 '11 at 01:23
  • would not adding a more key in the named tuple make things easier? I'd say add a new attribute `count` – openrijal Jan 30 '18 at 21:02
20

Your best option will be to create a simple data structure to model what you have. Then you can store these objects in a simple list and sort/retrieve them any way you wish.

For this case, I'd use the following class:

class Fruit:
    def __init__(self, name, color, quantity): 
        self.name = name
        self.color = color
        self.quantity = quantity

    def __str__(self):
        return "Name: %s, Color: %s, Quantity: %s" % \
     (self.name, self.color, self.quantity)

Then you can simply construct "Fruit" instances and add them to a list, as shown in the following manner:

fruit1 = Fruit("apple", "red", 12)
fruit2 = Fruit("pear", "green", 22)
fruit3 = Fruit("banana", "yellow", 32)
fruits = [fruit3, fruit2, fruit1] 

The simple list fruits will be much easier, less confusing, and better-maintained.

Some examples of use:

All outputs below is the result after running the given code snippet followed by:

for fruit in fruits:
    print fruit

Unsorted list:

Displays:

Name: banana, Color: yellow, Quantity: 32
Name: pear, Color: green, Quantity: 22
Name: apple, Color: red, Quantity: 12

Sorted alphabetically by name:

fruits.sort(key=lambda x: x.name.lower())

Displays:

Name: apple, Color: red, Quantity: 12
Name: banana, Color: yellow, Quantity: 32
Name: pear, Color: green, Quantity: 22

Sorted by quantity:

fruits.sort(key=lambda x: x.quantity)

Displays:

Name: apple, Color: red, Quantity: 12
Name: pear, Color: green, Quantity: 22
Name: banana, Color: yellow, Quantity: 32

Where color == red:

red_fruit = filter(lambda f: f.color == "red", fruits)

Displays:

Name: apple, Color: red, Quantity: 12
Cuga
  • 17,668
  • 31
  • 111
  • 166
19

Database, dict of dicts, dictionary of list of dictionaries, named tuple (it's a subclass), sqlite, redundancy... I didn't believe my eyes. What else ?

"It might well be that dictionaries with tuples as keys are not the proper way to handle this situation."

"my gut feeling is that a database is overkill for the OP's needs; "

Yeah! I thought

So, in my opinion, a list of tuples is plenty enough :

from operator import itemgetter

li = [  ('banana',     'blue'   , 24) ,
        ('apple',      'green'  , 12) ,
        ('strawberry', 'blue'   , 16 ) ,
        ('banana',     'yellow' , 13) ,
        ('apple',      'gold'   , 3 ) ,
        ('pear',       'yellow' , 10) ,
        ('strawberry', 'orange' , 27) ,
        ('apple',      'blue'   , 21) ,
        ('apple',      'silver' , 0 ) ,
        ('strawberry', 'green'  , 4 ) ,
        ('banana',     'brown'  , 14) ,
        ('strawberry', 'yellow' , 31) ,
        ('apple',      'pink'   , 9 ) ,
        ('strawberry', 'gold'   , 0 ) ,
        ('pear',       'gold'   , 66) ,
        ('apple',      'yellow' , 9 ) ,
        ('pear',       'brown'  , 5 ) ,
        ('strawberry', 'pink'   , 8 ) ,
        ('apple',      'purple' , 7 ) ,
        ('pear',       'blue'   , 51) ,
        ('chesnut',    'yellow',  0 )   ]


print set( u[1] for u in li ),': all potential colors'
print set( c for f,c,n in li if n!=0),': all effective colors'
print [ c for f,c,n in li if f=='banana' ],': all potential colors of bananas'
print [ c for f,c,n in li if f=='banana' and n!=0],': all effective colors of bananas'
print

print set( u[0] for u in li ),': all potential fruits'
print set( f for f,c,n in li if n!=0),': all effective fruits'
print [ f for f,c,n in li if c=='yellow' ],': all potential fruits being yellow'
print [ f for f,c,n in li if c=='yellow' and n!=0],': all effective fruits being yellow'
print

print len(set( u[1] for u in li )),': number of all potential colors'
print len(set(c for f,c,n in li if n!=0)),': number of all effective colors'
print len( [c for f,c,n in li if f=='strawberry']),': number of potential colors of strawberry'
print len( [c for f,c,n in li if f=='strawberry' and n!=0]),': number of effective colors of strawberry'
print

# sorting li by name of fruit
print sorted(li),'  sorted li by name of fruit'
print

# sorting li by number 
print sorted(li, key = itemgetter(2)),'  sorted li by number'
print

# sorting li first by name of color and secondly by name of fruit
print sorted(li, key = itemgetter(1,0)),'  sorted li first by name of color and secondly by name of fruit'
print

result

set(['blue', 'brown', 'gold', 'purple', 'yellow', 'pink', 'green', 'orange', 'silver']) : all potential colors
set(['blue', 'brown', 'gold', 'purple', 'yellow', 'pink', 'green', 'orange']) : all effective colors
['blue', 'yellow', 'brown'] : all potential colors of bananas
['blue', 'yellow', 'brown'] : all effective colors of bananas

set(['strawberry', 'chesnut', 'pear', 'banana', 'apple']) : all potential fruits
set(['strawberry', 'pear', 'banana', 'apple']) : all effective fruits
['banana', 'pear', 'strawberry', 'apple', 'chesnut'] : all potential fruits being yellow
['banana', 'pear', 'strawberry', 'apple'] : all effective fruits being yellow

9 : number of all potential colors
8 : number of all effective colors
6 : number of potential colors of strawberry
5 : number of effective colors of strawberry

[('apple', 'blue', 21), ('apple', 'gold', 3), ('apple', 'green', 12), ('apple', 'pink', 9), ('apple', 'purple', 7), ('apple', 'silver', 0), ('apple', 'yellow', 9), ('banana', 'blue', 24), ('banana', 'brown', 14), ('banana', 'yellow', 13), ('chesnut', 'yellow', 0), ('pear', 'blue', 51), ('pear', 'brown', 5), ('pear', 'gold', 66), ('pear', 'yellow', 10), ('strawberry', 'blue', 16), ('strawberry', 'gold', 0), ('strawberry', 'green', 4), ('strawberry', 'orange', 27), ('strawberry', 'pink', 8), ('strawberry', 'yellow', 31)]   sorted li by name of fruit

[('apple', 'silver', 0), ('strawberry', 'gold', 0), ('chesnut', 'yellow', 0), ('apple', 'gold', 3), ('strawberry', 'green', 4), ('pear', 'brown', 5), ('apple', 'purple', 7), ('strawberry', 'pink', 8), ('apple', 'pink', 9), ('apple', 'yellow', 9), ('pear', 'yellow', 10), ('apple', 'green', 12), ('banana', 'yellow', 13), ('banana', 'brown', 14), ('strawberry', 'blue', 16), ('apple', 'blue', 21), ('banana', 'blue', 24), ('strawberry', 'orange', 27), ('strawberry', 'yellow', 31), ('pear', 'blue', 51), ('pear', 'gold', 66)]   sorted li by number

[('apple', 'blue', 21), ('banana', 'blue', 24), ('pear', 'blue', 51), ('strawberry', 'blue', 16), ('banana', 'brown', 14), ('pear', 'brown', 5), ('apple', 'gold', 3), ('pear', 'gold', 66), ('strawberry', 'gold', 0), ('apple', 'green', 12), ('strawberry', 'green', 4), ('strawberry', 'orange', 27), ('apple', 'pink', 9), ('strawberry', 'pink', 8), ('apple', 'purple', 7), ('apple', 'silver', 0), ('apple', 'yellow', 9), ('banana', 'yellow', 13), ('chesnut', 'yellow', 0), ('pear', 'yellow', 10), ('strawberry', 'yellow', 31)]   sorted li first by name of color and secondly by name of fruit
George Stocker
  • 57,289
  • 29
  • 176
  • 237
eyquem
  • 26,771
  • 7
  • 38
  • 46
  • 1
    Hi, I like your solution however it does not address issues of operation complexity. all the search types are liner ( O(n) ) in the size of the list. while It would make sense that the OP wanted to have some actions be faster then others (e.g. getting the yellow banana count would be something that I would expect to be possible in O(1). – epeleg Aug 02 '17 at 12:31
13

A dictionary probably isn't what you should be using in this case. A more full featured library would be a better alternative. Probably a real database. The easiest would be sqlite. You can keep the whole thing in memory by passing in the string ':memory:' instead of a filename.

If you do want to continue down this path, you can do it with the extra attributes in the key or the value. However a dictionary can't be the key to a another dictionary, but a tuple can. The docs explain what's allowable. It must be an immutable object, which includes strings, numbers and tuples that contain only strings and numbers (and more tuples containing only those types recursively...).

You could do your first example with d = {('apple', 'red') : 4}, but it'll be very hard to query for what you want. You'd need to do something like this:

#find all apples
apples = [d[key] for key in d.keys() if key[0] == 'apple']

#find all red items
red = [d[key] for key in d.keys() if key[1] == 'red']

#the red apple
redapples = d[('apple', 'red')]
chmullig
  • 13,006
  • 5
  • 35
  • 52
  • 4
    I didn't, and wouldn't, downvote this answer, because at larger scales databases are (obviously!) the best way to go. But my gut feeling is that a database is overkill for the OP's needs; perhaps that explains the downvote? – senderle Feb 02 '11 at 20:08
4

With keys as tuples, you just filter the keys with given second component and sort it:

blue_fruit = sorted([k for k in data.keys() if k[1] == 'blue'])
for k in blue_fruit:
  print k[0], data[k] # prints 'banana 24', etc

Sorting works because tuples have natural ordering if their components have natural ordering.

With keys as rather full-fledged objects, you just filter by k.color == 'blue'.

You can't really use dicts as keys, but you can create a simplest class like class Foo(object): pass and add any attributes to it on the fly:

k = Foo()
k.color = 'blue'

These instances can serve as dict keys, but beware their mutability!

9000
  • 39,899
  • 9
  • 66
  • 104
3

You could have a dictionary where the entries are a list of other dictionaries:

fruit_dict = dict()
fruit_dict['banana'] = [{'yellow': 24}]
fruit_dict['apple'] = [{'red': 12}, {'green': 14}]
print fruit_dict

Output:

{'banana': [{'yellow': 24}], 'apple': [{'red': 12}, {'green': 14}]}

Edit: As eumiro pointed out, you could use a dictionary of dictionaries:

fruit_dict = dict()
fruit_dict['banana'] = {'yellow': 24}
fruit_dict['apple'] = {'red': 12, 'green': 14}
print fruit_dict

Output:

{'banana': {'yellow': 24}, 'apple': {'green': 14, 'red': 12}}

GreenMatt
  • 18,244
  • 7
  • 53
  • 79
  • 2
    Dictionary of list of dictionaries? Maybe dictionary of dictionary would suffice? – eumiro Feb 02 '11 at 19:43
  • @eumiro: Thanks, you're right, and that was my original idea. However, I turned it into a dict of lists of dicts while coding up the original example. I've added a dict of dicts example. – GreenMatt Feb 02 '11 at 19:53
  • Nested dictionaries tend to be confusing. Please see my answer – Cuga Feb 02 '11 at 20:17
  • @Cuga: I agree that dicts of dicts, etc. can get confusing. I am just providing an illustrative example to answer @Nico's question as asked. – GreenMatt Feb 02 '11 at 20:41
  • I apologize: I didn't mean to imply your solution is wrong; it clearly works and in some situations it could be the ideal one. I wanted to share my take on the situation. – Cuga Feb 02 '11 at 21:33
2

This type of data is efficiently pulled from a Trie-like data structure. It also allows for fast sorting. The memory efficiency might not be that great though.

A traditional trie stores each letter of a word as a node in the tree. But in your case your "alphabet" is different. You are storing strings instead of characters.

it might look something like this:

root:                Root
                     /|\
                    / | \
                   /  |  \     
fruit:       Banana Apple Strawberry
              / |      |     \
             /  |      |      \
color:     Blue Yellow Green  Blue
            /   |       |       \
           /    |       |        \
end:      24   100      12        0

see this link: trie in python

Community
  • 1
  • 1
Scott Morken
  • 1,591
  • 1
  • 11
  • 20
2

You want to use two keys independently, so you have two choices:

  1. Store the data redundantly with two dicts as {'banana' : {'blue' : 4, ...}, .... } and {'blue': {'banana':4, ...} ...}. Then, searching and sorting is easy but you have to make sure you modify the dicts together.

  2. Store it just one dict, and then write functions that iterate over them eg.:

    d = {'banana' : {'blue' : 4, 'yellow':6}, 'apple':{'red':1} }
    
    blueFruit = [(fruit,d[fruit]['blue']) if d[fruit].has_key('blue') for fruit in d.keys()]
    
BenMorel
  • 34,448
  • 50
  • 182
  • 322
highBandWidth
  • 16,751
  • 20
  • 84
  • 131
  • I can't figure out why the code in my answer doesn't show up in the correct format. I've tried editing and marking the last two lines as code, but it doesnt work! – highBandWidth Feb 04 '11 at 21:23
  • 1
    you've created a numbered list, and the parser is interpreting the code (indented 4 spaces) as a continuation of the second item of that list. Indent the code another 4 spaces for a total of 8, and the parser will recognize the code as code and format it correctly. – senderle Feb 25 '11 at 21:29