Efficiently Searching Nested Lists

Question

I have a csv file full of tax data. I read the data into nested lists, so that it is formatted like this:

['Alabama', 'Single rate', '0.02', '0.04', '5']
['Alabama', 'Single bracket', '500', '3000']
['Alabama', 'Couple rate', '0.02', '0.04', '0.05']
['Alabama', 'Couple bracket', '1000', '6000']

I would like to be able to input state and marital status and then return the relevant lists of rates and brackets. I've done so here, but I feel like there much be a much simpler approach. Any suggestions?

search_state  = 'Alabama'
search_status = 'Single'
rates = []
brackets = []
for sublist in cleaned_data:
  if search_state in sublist[0] and search_status in sublist[1]:
    if 'rate' in sublist[1]:
      rates = [eval(x) for x in sublist[2:]]
    if 'bracket' in sublist[1]:
      brackets = [eval(x) for x in sublist[2:]]

It isn't clear from your post what exact format `cleaned_data` is in. — Mike Graham, Jan 31 '14 at 15:52
If you want to be able to do this many times with lots of data, this is exactly the sort of thing databases exist for. The stdlib `sqlite3` module is very good. — Mike Graham, Jan 31 '14 at 15:53
You basically never, ever, ever want to use `eval`. If you have a string like `'"foo"'` and want to get a string like `'foo'`, try `s[1:-1]` or, if you really, really insist, `ast.literal_eval(s)`. — Mike Graham, Jan 31 '14 at 15:54
@tobias_k If he is getting csv data then the overhead of constructing the dictionary is much the same as the overhead of him constructing the `rates` and `brackets` lists. — 2rs2ts, Jan 31 '14 at 15:55
`sublist` is a name that doesn't help reveal the format of your data. You might end up writing clear code by trying to give clear names, like maybe `for states, statuses in cleaned_data:`. (I don't know the actual format you have to know the real meaning that should lead to the name.) — Mike Graham, Jan 31 '14 at 15:57
if you can read the data in nested list, why you can't read in proper dictionary ? — James Sapam, Jan 31 '14 at 16:04
@yopy He said that it is a [csv](http://en.wikipedia.org/wiki/Comma-separated_values) file. — 2rs2ts, Jan 31 '14 at 16:12
Ok then he can directly converted into nested dictionary while reading from the file right. — James Sapam, Jan 31 '14 at 16:18

dawg · Accepted Answer · 2014-02-01T20:43:07.367

You would be better served with a nested dictionary:

rates={'Alabama':{'Single rate':['0.02', '0.04', '5'],
                  'Single bracket': ['500', '3000'],
                  'Couple rate': ['0.02', '0.04', '0.05'],
                  'Couple bracket': ['1000', '6000']}}

print(rates['Alabama']['Couple rate'])
# ['0.02', '0.04', '0.05']

Assuming your cdv file looks like this:

'Alabama', 'Single rate', '0.02', '0.04', '5'
'Alabama', 'Single bracket', '500', '3000'
'Alabama', 'Couple rate', '0.02', '0.04', '0.05'
'Alabama', 'Couple bracket', '1000', '6000'

You can construct the nested dict this way:

import csv

rates={}
with open(ur_file) as f:
    for line in csv.reader(f, skipinitialspace=True, quotechar="'"):
        rates.setdefault(line[0],{})[line[1]]=[float(e) for e in line[2:]]

print(rates)

Prints:

{'Alabama': {'Couple rate': [0.02, 0.04, 0.05], 
 'Single rate': [0.02, 0.04, 5.0], 
 'Single bracket': [500.0, 3000.0], 
 'Couple bracket': [1000.0, 6000.0]}}

Edit

As pointed out in the comments, a three tier nested dict is probably better, like this data structure:

rates={'Alabama':{'Single': {'rate':['0.02', '0.04', '5'],
                             'bracket': ['500', '3000']},
                  'Couple': {'rate': ['0.02', '0.04', '0.05'],
                             'bracket': ['1000', '6000']}}}

While it is trivial to use defaultdict or setdefault to deal with a two tier dict with missing keys, it takes a little more though to deal with multiple levels elegantly.

My favorite is to use a Perl like autovivification subclass a dict like so:

class AutoVivify(dict):
    """Implementation of perl's autovivification feature."""
    def __missing__(self, item):
        value = self[item] = type(self)()
        return value 

rates=AutoVivify()
with open(ur_file) as f:
    for line in csv.reader(f, skipinitialspace=True, quotechar="'"):
        state=line[0]
        k1,k2=line[1].split()
        rates[state][k1][k2]=[float(e) for e in line[2:]]

print(rates)

Prints:

{'Alabama': {'Single': { 
                         'rate': [0.02, 0.04, 5.0], 
                         'bracket': [500.0, 3000.0]}, 
             'Couple': {  
                         'rate': [0.02, 0.04, 0.05], 
                         'bracket': [1000.0, 6000.0]}}}

+1, but it might be even better to have three levels of dicts, since the "search query" seems to be just "single", and the expected result the rate and bracket value, i.e. `{'Alabama': {'Couple': {'rate': [0.02, 0.04, 0.05], ...` — tobias_k, Jan 31 '14 at 21:33
Of course you are all correct - a nested dictionary makes much more sense. — user1978374, Feb 01 '14 at 18:41
Also, thanks to @mike-graham for the suggestions to clean up the code. — user1978374, Feb 01 '14 at 18:43
@tobias_k: A little more thought to do it that way, but I agree -- a better way to do it. Thanks! — dawg, Feb 01 '14 at 20:21
Another way to do the "autovivification" is [this](http://stackoverflow.com/a/12590568/1639625): `infinitedict = lambda : defaultdict(infinitedict)` — tobias_k, Feb 01 '14 at 21:27
@tobias_k: I know, but then the printed dictionary is really ugly. +1 tho — dawg, Feb 01 '14 at 22:46
@dawg That's true, and your `AutoVivify` is indeed a nice alternative. Thanks! — tobias_k, Feb 01 '14 at 22:56

Efficiently Searching Nested Lists

1 Answers1