The following does not require a pre-sorted iterable and runs in O(n)
time, however it assumes an asymmetry between state and the other dictionary keys (which given your example seems to be a correct assumption).
import collections
def pack(iterable):
out = collections.defaultdict(list) #or use defaultdict(set)
for d in iterable:
out[d['state']].append(d['city'])
return out
it = [
{'state': '1', 'city': 'a'},
{'state': '1', 'city': 'b'},
{'state': '2', 'city': 'c'},
{'state': '2', 'city': 'd'},
{'state': '3', 'city': 'e'}
]
pack(it) == {'1': ['a', 'b'],
'2': ['c', 'd'],
'3': ['e']}
If you need to return an iterable in the same format as requested, you could convert out
into a list
.
def convert(out):
final = []
for state, city in out.iteritems(): #Python 3.0+ use .items()
final.append({'state': state, 'city': city})
return final
convert(pack(it)) == [
{'state': '1', 'city': ['a', 'b']},
{'state': '2', 'city': ['c', 'd']},
{'state': '3', 'city': ['e']}
]
If you have more than just 2 keys in your input, you would need to make the following changes:
it = [{'state': 'WA', 'city': 'Seattle', 'zipcode': 98101, 'city_population': 9426},
{'state': 'OR', 'city': 'Portland', 'zipcode': 97225, 'city_population': 24749},
{'state': 'WA', 'city': 'Spokane', 'zipcode': 99201, 'city_population': 12523}]
def citydata():
return {'city': [], 'zipcode': [], 'state_population': 0} #or use a namedtuple('Location', 'city zipcode state_population')
def pack(iterable):
out = defaultdict(citydata)
for d in iterable:
out[d['state']]['city'].append(d['city'])
out[d['state']]['zipcode'].append(d['zipcode'])
out[d['state']]['state_population'] += d['city_population']
return out
pack(it) == {
'WA':
{'city': ['Seattle', 'Spokane'], 'zipcode': [98101, 99201], 'state_population': 21949},
'OR':
{'city': ['Portland'], 'zipcode': [97225], 'state_population': 24749}
}
The convert
function would need adjusted accordingly.
convert(pack(it)) == [
{'state': 'WA', 'city': ['Seattle', 'Spokane'], 'zipcode': [98101, 99201], 'state_population': 21949},
{'state': 'OR', 'city': ['Portland'], 'zipcode': [97225], 'state_population': 24749}
]
To maintain order of the original iterable, use an OrderedDefaultdict instead of a defaultdict.