create optimum data structure python

Question

I am cross referencing two data sources which share 6 common fields. The idea is the the marketing costs in file 1 are split out over the sales transactions in file 2. I've written a way to build a data structure from the first file so that the second one can access it quickly, but it seems un-pythonic to me. I'm interested to get some input and opinions on whether anyone thinks it could be written in a better way.

cost_matrix = {}
for line in marketing_costs:
    line_date_object = time.strptime(line['date'], "%d/%m/%Y")
    period = '%04d_%02d' % (line_date_object.tm_year, line_date_object.tm_mon)
    territory = line['territory'].lower()
    salesperson=line['salesperson'].lower()
    customer_type = line['customer_type'].lower()
    affiliate=line['affiliate'].lower()
    product_group = line['product_group'].lower()
    line_mktg_cost=line['mktg_cost']
    try:
        cost_matrix[period]
    except KeyError:
        cost_matrix[period]={}
    try:
        cost_matrix[period][territory]
    except KeyError:
        cost_matrix[period][territory]={}
    try:
        cost_matrix[period][territory][salesperson]
    except KeyError:
        cost_matrix[period][territory][salesperson]={}
    try:
        cost_matrix[period][territory][salesperson][customer_type]
    except KeyError:
        cost_matrix[period][territory][salesperson][customer_type]={}
    try:
        cost_matrix[period][territory][salesperson][customer_type][affiliate]
    except KeyError:
        cost_matrix[period][territory][salesperson][customer_type][affiliate]={}
    try:
        cost_matrix[period][territory][salesperson][customer_type][affiliate][product_group]
    except KeyError:
        cost_matrix[period][territory][salesperson][customer_type][affiliate][product_group]={}
        cost_matrix[period][territory][salesperson][customer_type][affiliate][product_group]['mktg_cost']=0
    cost_matrix[period][territory][salesperson][customer_type][affiliate][product_group]['mktg_cost']+=Decimal(line_mktg_cost)

abarnert · Accepted Answer · 2015-04-24T10:54:38.233

Every one of those 4-line try/except blocks can be replaced by a 1-liner using setdefault:

setdefault(key[, default])

If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.

So this:

cost_matrix[period].setdefault(territory, {})

… is equivalent to:

try:
    cost_matrix[period][territory]
except KeyError:
    cost_matrix[period][territory]={}

Except that you can use it in a larger expression, which means in theory you can turn the whole thing into one giant expression if you want to, although I'm not sure I'd do that.

You can simplify things even further by using a recursive defaultdict. A defaultdict is basically just a dict that handles missing keys by set-defaulting automatically, and a recursive one does so with another defaultdict instead of a plain dict. (You do still need one setdefault or just plain key assignment at the end to handle the default of 0 instead of yet another dict…)

Like this:

_make_defaultdict = lambda: defaultdict(_make_defaultdict)
recursive_defaultdict = defaultdict(_make_defaultdict)

cost_matrix = recursive_defaultdict()
cost_matrix[period][territory][salesperson][customer_type][
    affiliate][product_group]['mktg_cost'] = 0
cost_matrix[period][territory][salesperson][customer_type][
    affiliate][product_group]['mktg_cost'] += Decimal(line_mktg_cost)

However, be aware that this means you'll never get a KeyError anywhere else in your code either. If that's not acceptable, then stick with setdefault. (Although if you're basically building the dict up, then using it, you can "freeze" it into a normal dict just by recursively copying it…)

Thank you this is great! i will use the setdefault because other parts of the script depend on KeyError exceptions. — teebagz, Apr 24 '15 at 10:37
@TommyGaboreau: There's also a recipe on ActiveState somewhere for a recursive defaultdict that you can freeze and unfreeze on the fly, if you want to search for that. (I used it once just because it's such a neat idea, but I've never used it again in real code because I've never had a good use for it… but maybe you do.) — abarnert, Apr 24 '15 at 10:55
it's good to know that such a thing exists but for now your setdefault solution is perfect. Thank You. — teebagz, Apr 24 '15 at 11:41

create optimum data structure python

1 Answers1