Sum nested lists based on condition in Python

Question

I have a nested list looking like this:

[['Vienna','2012', 890,503,70],['London','2014', 5400, 879,78],
 ['London','2014',4800,70,90],['Bern','2013',300,450,678], 
 ['Vienna','2013', 700,850,90], ['Bern','2013',500,700,90]]

What I want to do is summing every integervalue in the sublist with another sublist if city and year are equal. I first thought of a dictionary with city and year as key, but it caused problems sorting it.

Then I had: {('Vienna','2012'):[890,503,70],('Bern','2013'):[800,1150,768],...}

I also tried something like this:

[sum(x) for x in zip(*list) if x[0] == x[0]] but of course it did not work.

Can I do something like this with a nested list to so sorting it by city and year would be easier?

niemmi · Accepted Answer · 2016-12-10T10:47:44.543

You could construct a result dict where key is tuple of first two items in the original lists and value is list of numbers. Every time you add value to dict you could use get to either return existing element or given default value, in this case empty list.

Once you have the existing list and list to add you can use zip_longest with fillvalue to get numbers to sum from both lists. zip_longest returns tuples of length 2 containing one number from each list. In case one list is longer than other fillvalue is used as default so this will also work in case lists have different lengths. Finally list comprehension could used to sum each item for a new value:

from itertools import zip_longest

l = [
    ['Vienna','2012', 890,503,70],['London','2014', 5400, 879,78],
    ['London','2014',4800,70,90],['Bern','2013',300,450,678],
    ['Vienna','2013', 700,850,90], ['Bern','2013',500,700,90]
]

res = {}
for x in l:
    key = tuple(x[:2])
    res[key] = [i + j for i, j in zip_longest(res.get(key, []), x[2:], fillvalue=0)]

print(res)

Output:

{('Vienna', '2013'): [700, 850, 90], ('London', '2014'): [10200, 949, 168], 
 ('Vienna', '2012'): [890, 503, 70], ('Bern', '2013'): [800, 1150, 768]}

If you want to sort the cities alphabetically and years latest first you could pass custom key to sorted:

for item in sorted(res.items(), key=lambda x: (x[0][0], -int(x[0][1]))):
    print(item)

Output:

(('Bern', '2013'), [800, 1150, 768])
(('London', '2014'), [10200, 949, 168])
(('Vienna', '2013'), [700, 850, 90])
(('Vienna', '2012'), [890, 503, 70])

This looks very good. Use `res = collections.OrderedDict()` to improve. — Gribouillis, Dec 10 '16 at 10:26
Also with `defaultdict(list)` you can just write `res[key]` instead of `res.get(key, [])` — Poloq, Dec 10 '16 at 10:39
@Gribouillis: This depends entirely on the ordering of expected output. If items are needed in the same order as in original list then it would be useful. — niemmi, Dec 10 '16 at 10:43

score 2 · Answer 2 · edited Dec 07 '19 at 13:26

You can achieve the result you want by simply using a dictionary store all the country names and years as one value. Each key in the dictionary is a tuple of the country name and the corresponding year.

Ex: key = (country,year).

This allows us to have the unique values that we need to group them by.

L = [
        ['Vienna','2012', 890,503,70],['London','2014', 5400, 879,78],
        ['London','2014',4800,70,90],['Bern','2013',300,450,678],
        ['Vienna','2013', 700,850,90], ['Bern','2013',500,700,90]
    ]

    countries = {}

    for list in L:
        key = tuple(list[0:2])
        values = list[2:]
        if key in countries:
            countries[key] = [sum(v) for v in zip(countries[key],values)]
        else:
            countries[key] = values

    print(countries)

out:

 {
     ('Vienna', '2012'): [890, 503, 70],
     ('London', '2014'): [10200, 949, 168],
     ('Bern', '2013'): [800, 1150, 768],
     ('Vienna', '2013'): [700, 850, 90]
}

score 0 · Answer 3 · answered Dec 10 '16 at 10:23

0

You should maintain a dictionary as you have outlined in the question. Something like this will help,

cities = {}
for a in list:
    city_key = a[:1]
    if city_key in cities:
        cities[city_key] = [a + b for a, b in zip(a[2:], cities[city_key])]
    else:
        cities[city_tuple] = a[2:]

answered Dec 10 '16 at 10:23

martianwars

6,380
5
35
44

Ok i already did it nearly the way you did. the Problem is that I need to sort the output. It should look like this: Vienna 2014 .... 2013 .... what would the easiest way to do this? – IamnotaRabbit Dec 10 '16 at 10:28
You can easily sort the keys http://stackoverflow.com/questions/9001509/how-can-i-sort-a-dictionary-by-key – martianwars Dec 10 '16 at 10:30

score 0 · Answer 4 · answered Dec 10 '16 at 10:23

One way is to split the list of lists into a dict by the key you want (the city and year). Also the defaultdict helps squashing all distances into a flat list

>>> from collections import defaultdict
>>> dct = defaultdict(list)
>>> for item in lst:
...    dct[(item[0], item[1])].extend(item[2:])

Now dct has the integers grouped by the city and year:

>>> dct
defaultdict(<type 'list'>, {('Vienna', '2013'): [700, 850, 90], ('London', '2014'): [5400, 879, 78, 4800, 70, 90], ('Vienna', '2012'): [890, 503, 70], ('Bern', '2013'): [300, 450, 678, 500, 700, 90]})

And you can just sum them:

>>> for key in dct:
...    print(key, sum(dct[key]))
... 
(('Vienna', '2013'), 1640)
(('London', '2014'), 11317)
(('Vienna', '2012'), 1463)
(('Bern', '2013'), 2718)

宏杰李 · Answer 5 · 2016-12-10T10:46:39.870

nl = [['Vienna','2012', 890,503,70],['London','2014', 5400, 879,78],
      ['London','2014',4800,70,90],['Bern','2013',300,450,678],
      ['Vienna','2013', 700,850,90], ['Bern','2013',500,700,90]]
d = {}
for l in nl:
    key = l[0] , l[1]
    value = l[2:]
    if key not in d:
        d[key] = value
    else:
        d[key] = [sum(i)for i in zip(d[key], value)]
print(d)

out:

{('Vienna', '2012'): [890, 503, 70], ('London', '2014'): [10200, 949, 168], ('Bern', '2013'): [800, 1150, 768], ('Vienna', '2013'): [700, 850, 90]}

RomanPerekhrest · Answer 6 · 2016-12-10T10:41:40.560

The solution using itertools.groupby and operator.itemgetter functions:

import itertools, operator

l = [['Vienna','2012', 890,503,70],['London','2014', 5400, 879,78],
 ['London','2014',4800,70,90],['Bern','2013',300,450,678],
 ['Vienna','2013', 700,850,90], ['Bern','2013',500,700,90]]

getter = operator.itemgetter(0, 1)  # the sequence to be grouped(first two items)
summed = [[k[0],k[1],sum(sum(d[2:]) for d in list(group))]
          for k, group in itertools.groupby(sorted(l, key=getter), getter)]

print(summed)

The output:

[['Bern', '2013', 2718], ['London', '2014', 11317], ['Vienna', '2012', 1463], ['Vienna', '2013', 1640]]

Sum nested lists based on condition in Python

6 Answers6

Linked

Related