faster and more 'pythonic' list of dictionaries

Question

For simplicity, I've provided 2 lists in a list, but I'm actually dealing with a hundred of lists in a list, each containing a sizable amount of dictionaries. I only want to get the value of 'status' key in the 1st dictionary without checking any other dictionaries in that list (since I know they all contain the same value at that key). Then I will perform some sort of clustering within each big dictionary. I need to efficiently concatenate all 'title' values. Is there a way to make my code more elegant and much faster?

I have:

nested = [
    [
        {'id': 287, 'title': 'hungry badger',  'status': 'High'},
        {'id': 437, 'title': 'roadtrip to Kansas','status': 'High'}
    ],
    [
        {'id': 456, 'title': 'happy title here','status': 'Medium'},
        {'id': 342,'title': 'soft big bear','status': 'Medium'}
    ]
]

I'd like:

result = [
    {
        'High': [
            {'id': 287, 'title': 'hungry badger'},
            {'id': 437, 'title': 'roadtrip to Kansas'}
        ]
    },
    {
        'Medium': [
            {'id': 456, 'title': 'happy title here'},
            {'id': 342, 'title': 'soft big bear'}
        ]
    }
]

What I tried:

for oneList in nested: 
   result= {}
   for i in oneList:        
       a= list(i.keys()) 
       m= [i[key] for key in a if key not in ['id','title']]
       result[m[0]]=oneList
       for key in a:
            if key not in ['id','title']:
                del i[key]

score 2 · Answer 1 · edited Sep 17 '16 at 00:14

2

from itertools import groupby    
result = groupby(sum(nested,[]), lambda x: x['status'])

How it works:

sum(nested,[]) concatenates all your outer lists together into one big list of dictionaries

groupby(, lambda x: x['status']) groups all your objects by their status property

Note itertools.groupby returns a generator (not a list), so if you want to materialize the generator you need to do something like follows.

from itertools import groupby    
result = groupby(sum(nested,[]), lambda x: x['status'])
result = {key:list(val) for key,val in result}

edited Sep 17 '16 at 00:14

Soviut

88,194
49
192
260

answered Sep 17 '16 at 00:09

gnicholas

2,041
1
21
32

omg! @.@ Wow. you are so fast with the solution. Thank you so very much!!!! Works perfectly. – el347 Sep 17 '16 at 00:17
1

One: Don't use `sum(nested, [])`, ever. It's the slowest possible way to flatten, and it gets slower the more you're flattening (it's creating `n` temporary `list`s, growing each time). You're already using `itertools`, and you're iterating the results (don't need a true `list` at all), so just use `itertools.chain.from_iterable` to flatten (and because `lambda` is evil/slow when not needed, `operator.itemgetter` for `key`): `groupby(chain.from_iterable(nested), itemgetter('status'))`. [`sum(x, [])` is _slow_ (see comments)](http://stackoverflow.com/a/39520827/364696). – ShadowRanger Sep 17 '16 at 01:09
1

@ShadowRanger Thanks man! Just ran this: from itertools import chain; import operator; s= groupby(chain.from_iterable(results), key=operator.itemgetter('status')); for key, grp in s: print(key, list(grp)) All good. – el347 Sep 17 '16 at 01:31

score 2 · Answer 2 · answered Sep 17 '16 at 00:20

You could make a defaultdict for each nested list:

import collections
nested = [
[{'id': 287, 'title': 'hungry badger',  'status': 'High'},
{'id': 437, 'title': 'roadtrip to Kansas','status': 'High'}],     
[{'id': 456, 'title': 'happy title here','status': 'Medium'},
{'id': 342,'title': 'soft big bear','status': 'Medium'}]   ]
result = []
for l in nested:
    r = collections.defaultdict(list)
    for d in l:
        name = d.pop('status')
        r[name].append(d)
    result.append(r)

This gives the following result:

>>> import pprint
>>> pprint.pprint(result)
[{'High': [{'id': 287, 'title': 'hungry badger'},
           {'id': 437, 'title': 'roadtrip to Kansas'}]},
 {'Medium': [{'id': 456, 'title': 'happy title here'},
             {'id': 342, 'title': 'soft big bear'}]}]

Awesome. Tnx! Learning smth new on here every day. Hehe. That itertools' groupby solution is looking so good; reduces complexity. Your answer taught me about collections.defaultdict(). Thanks again. — el347, Sep 17 '16 at 00:39
oh nice! Ur solution removes status and does what i was asking. I gonna time both of these just for heck of it and also play a bit more with itertools' groupby... Thanks for help, man! — el347, Sep 17 '16 at 01:07

faster and more 'pythonic' list of dictionaries

2 Answers2