3

For simplicity, I've provided 2 lists in a list, but I'm actually dealing with a hundred of lists in a list, each containing a sizable amount of dictionaries. I only want to get the value of 'status' key in the 1st dictionary without checking any other dictionaries in that list (since I know they all contain the same value at that key). Then I will perform some sort of clustering within each big dictionary. I need to efficiently concatenate all 'title' values. Is there a way to make my code more elegant and much faster?

I have:

nested = [
    [
        {'id': 287, 'title': 'hungry badger',  'status': 'High'},
        {'id': 437, 'title': 'roadtrip to Kansas','status': 'High'}
    ],
    [
        {'id': 456, 'title': 'happy title here','status': 'Medium'},
        {'id': 342,'title': 'soft big bear','status': 'Medium'}
    ]
]

I'd like:

result = [
    {
        'High': [
            {'id': 287, 'title': 'hungry badger'},
            {'id': 437, 'title': 'roadtrip to Kansas'}
        ]
    },
    {
        'Medium': [
            {'id': 456, 'title': 'happy title here'},
            {'id': 342, 'title': 'soft big bear'}
        ]
    }
]

What I tried:

for oneList in nested: 
   result= {}
   for i in oneList:        
       a= list(i.keys()) 
       m= [i[key] for key in a if key not in ['id','title']]
       result[m[0]]=oneList
       for key in a:
            if key not in ['id','title']:
                del i[key]
Soviut
  • 88,194
  • 49
  • 192
  • 260
el347
  • 87
  • 8

2 Answers2

2
from itertools import groupby    
result = groupby(sum(nested,[]), lambda x: x['status'])

How it works:

sum(nested,[]) concatenates all your outer lists together into one big list of dictionaries

groupby(, lambda x: x['status']) groups all your objects by their status property

Note itertools.groupby returns a generator (not a list), so if you want to materialize the generator you need to do something like follows.

from itertools import groupby    
result = groupby(sum(nested,[]), lambda x: x['status'])
result = {key:list(val) for key,val in result}
Soviut
  • 88,194
  • 49
  • 192
  • 260
gnicholas
  • 2,041
  • 1
  • 21
  • 32
  • omg! @.@ Wow. you are so fast with the solution. Thank you so very much!!!! Works perfectly. – el347 Sep 17 '16 at 00:17
  • 1
    One: Don't use `sum(nested, [])`, ever. It's the slowest possible way to flatten, and it gets slower the more you're flattening (it's creating `n` temporary `list`s, growing each time). You're already using `itertools`, and you're iterating the results (don't need a true `list` at all), so just use `itertools.chain.from_iterable` to flatten (and because `lambda` is evil/slow when not needed, `operator.itemgetter` for `key`): `groupby(chain.from_iterable(nested), itemgetter('status'))`. [`sum(x, [])` is _slow_ (see comments)](http://stackoverflow.com/a/39520827/364696). – ShadowRanger Sep 17 '16 at 01:09
  • 1
    @ShadowRanger Thanks man! Just ran this: from itertools import chain; import operator; s= groupby(chain.from_iterable(results), key=operator.itemgetter('status')); for key, grp in s: print(key, list(grp)) All good. – el347 Sep 17 '16 at 01:31
2

You could make a defaultdict for each nested list:

import collections
nested = [
[{'id': 287, 'title': 'hungry badger',  'status': 'High'},
{'id': 437, 'title': 'roadtrip to Kansas','status': 'High'}],     
[{'id': 456, 'title': 'happy title here','status': 'Medium'},
{'id': 342,'title': 'soft big bear','status': 'Medium'}]   ]
result = []
for l in nested:
    r = collections.defaultdict(list)
    for d in l:
        name = d.pop('status')
        r[name].append(d)
    result.append(r)

This gives the following result:

>>> import pprint
>>> pprint.pprint(result)
[{'High': [{'id': 287, 'title': 'hungry badger'},
           {'id': 437, 'title': 'roadtrip to Kansas'}]},
 {'Medium': [{'id': 456, 'title': 'happy title here'},
             {'id': 342, 'title': 'soft big bear'}]}]
TigerhawkT3
  • 48,464
  • 6
  • 60
  • 97
  • Awesome. Tnx! Learning smth new on here every day. Hehe. That itertools' groupby solution is looking so good; reduces complexity. Your answer taught me about collections.defaultdict(). Thanks again. – el347 Sep 17 '16 at 00:39
  • oh nice! Ur solution removes status and does what i was asking. I gonna time both of these just for heck of it and also play a bit more with itertools' groupby... Thanks for help, man! – el347 Sep 17 '16 at 01:07