3

I've got some data which originally looks like this which I'd like to process in Python 2.7:

id  year    value
1   2012    5
2   2012    50
3   2012    500
1   2013    6
2   2013    60
3   2013    600
1   2014    7
2   2014    70
3   2014    700

I can easily transform it into a list like this [[1,2012,5],[2,2012,6],...].

I'd like to convert this to a dictionary, as I'd like to look up all different values for a fixed id and/or year (If this idea is not so great and I should rather keep it as a list, please let me know in the comments.)

I know that a python dictionary needs a hashable key, so I could transform this table by concatenating id and year to a string and have a dictionary like

{'1_2012':'5','2_2012':'50', ...} 

Obviously, this is not very elegant if you want to read out the separate parts of the key. What's the easiest way to get a compound key dictionary which is still easily dumpable into a json?

Remi Guan
  • 21,506
  • 17
  • 64
  • 87
Roland
  • 517
  • 8
  • 25
  • 1
    If you want to look up in different ways, it sounds like you need *multiple* structures. Note that dictionary keys can be tuples, so e.g. `{(1, 2012): 5, ...}` would be valid; however, I don't know if JSON will handle it. – jonrsharpe Oct 23 '15 at 09:10

3 Answers3

3

You can use tuples as keys. You can create a tuple using parentheses, e.g. (2,2012) is a tuple. They are immutable, and so can be used as dict keys:

d = {(1,2012):5, (2,2012):50}

You can index into tuples just as you index into a list, e.g. (1,2012)[1] is 2012.

Tom Karzes
  • 22,815
  • 2
  • 22
  • 41
3

tuple is hashable:

{(1, 2012): 5, (2, 2012): 50,}

However, this dict cannot be dumped, the dict key in this case should be a string:

import json
import ast

# Works if keys satisfy following requirement:
# key == ast.literal_eval(repr(key))
# This is true for tuples having numbers inside.

def dumps(d):
    return json.dumps({repr(key): value for key, value in d.items()})

def loads(s):
    d = json.loads(s)
    return {ast.literal_eval(key): value for key, value in d.items()}

This pair of function should go until you start using keys that are complex enough and/or that have their __repr__ method implemented poorly.

u354356007
  • 3,205
  • 15
  • 25
  • I haven't implemented it yet, but this seems like the best idea. I haven't decided if I want to use `tuple` or `namedtuple` as key. The latter is also hashable, right? Do you know if this already touches an area of poor `__repr__` and/or too complex keys? – Roland Oct 24 '15 at 19:25
  • 1
    @Roland `__repr__` of a `namedtuple` is enough to satisfy `key == ast.literal_eval(repr(key))` - while both sender and receiver of the `json` string have the same definition of the `namedtuple` in their scope. This makes namedtuples not so different from just tuples, so I would go with the latter. Tuples occupy less space in RAM and `json`, and probably they work faster. However, if you have a lot of such data structures to send over multiple places, `namedtuple` should be picked, as `explicit is better than implicit`. – u354356007 Oct 24 '15 at 20:36
2

For this purpose I would suggest using a namedtuple as it's clearer than a standard tuple.

import collections
Key = collections.namedtuple('Key', ['id', 'year'])
data = {Key(id=1, year=2012): 5, Key(id=2, year=2012): 50}
Wolph
  • 78,177
  • 11
  • 137
  • 148
  • I was under the impression that `namedtuple`s are not very easy to `dump` (see http://stackoverflow.com/questions/5906831/serializing-a-python-namedtuple-to-json) – Roland Oct 23 '15 at 09:42
  • It takes a little bit of extra effort, but with a predefined format like this it's fairly easy to do. – Wolph Oct 23 '15 at 10:05