Nested list summation by group in python

Question

I have a nested list that is like this [[county, political party, votes received]] with the datatypes as string, string, and int.

How do I take a nested list and do summations by political party? I would like to have a table that compares all of the different political parties and has their total vote counts.

I know that I can just use a dict or pandas(group_by), but I would like to learn how to do this without them. I cannot find any questions that directly relate to this situation.

what is the type of `policital party` here? I assume a list of some kind of elements — rv.kvetch, Sep 19 '21 at 13:49
the description of your data is incomplete, but maybe these answers work for you: https://stackoverflow.com/questions/25184415/summation-of-specific-items-in-a-nested-list ; https://stackoverflow.com/questions/952914/how-to-make-a-flat-list-out-of-a-list-of-lists — braulio, Sep 19 '21 at 13:53
The political party is a string. I have a dataset that lists multiple different political parties and their votes they received in each municipality. — New at programming, Sep 19 '21 at 14:21

score 0 · Answer 1 · answered Sep 19 '21 at 13:55

You'll need to iterate through all the sub-lists, and store their sum in a map:

sums = {}
for i in big_list:
    _, party, votes = i # based on the question
    sums[party] = sums.get(party, 0) + votes # if it already has a summation
                                             # just get it, otherwise start 
                                             # from a summation of zero

# to get them, just iterate over the map
for party, total_votes in sums.items():
    print(party, total_votes)

Thank you very much this worked :) I appreciate the help sir. — New at programming, Sep 19 '21 at 14:34

Alain T. · Answer 2 · 2021-09-19T17:28:17.593

A dictionary is going to be more efficient but there are other (slower) approaches.

Sorting for example:

totalList = []
for _,party,votes in sorted(voteList,key=lambda v:v[1]):
    if not totalList or totalList[-1][0] != party:
        totalList.append([party,votes])
    else:
        totalList[-1][1] += votes

Multiple passes using distinct party names:

parties   = {party for _,party,_ in voteList}  # set of distinct parties
totalList = [ [party,sum(votes for _,p,votes in voteList if p==party)]
              for party in parties ]

There is also the Counter class from collections that is a specialized dictionary for this type of thing:

from collections import Counter
totals = Counter()
for _,party,votes in voteList: totals[party] += votes

amazing!!! thank you for your help!! – New at programming Sep 19 '21 at 17:25 — New at programming, Sep 19 '21 at 17:25

score 0 · Answer 3 · answered May 22 '22 at 21:46

i basically tried grouping by header and then apply summation on all columns with a pre-defined valid data type. I've been trying to avoid nested loops. About the input: headers and datatypes are both lists and content is a nested list

def group_by_header(headers, content, datatypes, /):
    key_name = "Party"
    key_index = headers.index(key_name)
    sorted_by_header = sorted(content, key=lambda x: x[key_index])
    group_by_header = {}

    # create iterator
    it = iter(sorted_by_header)
    for k, g in itertools.groupby(it, lambda x: x[key_index]):
        def sum_of_nums(column, dt, i):
            if dt[i] == "float":
                tmp = [float(s) for s in column]
                yield sum(tmp)
            if dt[i] == "int":
                tmp = [int(s) for s in column]
                yield sum(tmp)
            else:
                yield "NA"

        def generate_sum(group):
            index = 0
            for c in zip(*list(group)):
                yield sum_of_nums(c, datatypes, index)
                index += 1

        group_by_header.setdefault(k, [])
        for j in generate_sum(g):
            group_by_header[k].append(next(j))

    return group_by_header

Nested list summation by group in python

3 Answers3