A loop that makes multi-conditional summations

Question

I have a data frame of the form:

df = [["john","2019","30.2"] , ["john","2019","40"] , ["john","2020","50.3"] , 
      ["amy","2019","60"] , ["amy","2019","20"] , ["amy","2020","40.1"]]

my desired result would be a list of multi-conditional summations of the last index while the first two are equal:

> [["john", "2019", "70.2"] ,  ["john","2020","50.3"] , ["amy","2019","80"] , ["amy","2020","40.1"]]

What I tried to do, was a for loop that checks equality for each condition and then sums up the last index, if conditions are true – this is some kind of a pseudo-code:

for i in df[i]:
   if df[i][0] == df[i+1][0] and df[i][1] == df[i+1][1]: #if both conditions are true
      sum1 = sum(float(df[i][2]))
      lst = []
      lst.append(df[i][0])
      lst.append(df[i][1])
      lst.append(str(sum1))

Edit: Would appreciate a solution that doesn't use packages.

Is there any more data? If yes, are they in the same format - 'name, year, third index'? — Swagrim, Dec 18 '21 at 10:50

Rahul Kumar · Answer 1 · 2021-12-18T10:25:11.247

1

Since you are using df variable name I am assuming you are familiar with pandas.

You can easily do this in pandas. Just convert your list into df.

And the groupby columns which you want unique values and select the last row

df.groupby(['col_a', 'col_b'], as_index=False).last()

You can sort the df before calling groupby if you have any custom logic

edited Dec 18 '21 at 10:25

answered Dec 18 '21 at 10:19

Rahul Kumar

2,184
3
24
46

thanks a lot, but is there any way to do it without any packages? – Shoshan Ben-Tzvi Dec 18 '21 at 10:24

frippe · Answer 2 · 2021-12-18T10:32:37.233

1

Here's a way to do it using defaultdict:

from collections import defaultdict
sums = defaultdict(lambda: defaultdict(float))
for item in df:
    sums[item[0]][item[1]] += float(item[2])
lst = [[key, inner_key, value] for key in sums for inner_key, value in sums[key].items()]

edited Dec 18 '21 at 10:32

answered Dec 18 '21 at 10:27

frippe

1,329
1
8
15

score 1 · Accepted Answer · answered Dec 18 '21 at 10:41

Following code doesn't use any package. Starting from Python 3.7 all dicts are insertion-ordered, this fact is used in following code so that final result has order of original appearance of elements. If for some reason your python is below 3.7, tell me, I'll modify code to explicitly do ordering instead of relying on this language feature.

Try it online!

df = [["john","2019","30.2"], ["john","2019","40"], ["john","2020","50.3"],
      ["amy","2019","60"], ["amy","2019","20"], ["amy","2020","40.1"]]

r = {}
for *a, b in df:
    a = tuple(a)
    if a not in r:
       r[a] = 0
    r[a] += float(b)
r = [list(k) + [str(v)] for k, v in r.items()]

print(r)

Output:

[['john', '2019', '70.2'], ['john', '2020', '50.3'], ['amy', '2019', '80.0'], ['amy', '2020', '40.1']]

gboffi · Answer 4 · 2022-01-15T15:56:40.737

Dictionaries have the convenient setdefault method, that checks if its 1st argument is a key of the dictionary, and either return the corresponding value or a default value.

In our case, because we want to sum numerical values, of course the default must be 0.

We use a temporary dictionary, indexed by the tuple (name, year), and when we are finished with the summing we unfold the dictionary data into a list of lists, following the direction you showed in the question's pseudo-code.

In [15]: data = [["john","2019","30.2"] , ["john","2019","40"] , ["john","2020","50.3"] ,
    ...:         ["amy","2019","60"] , ["amy","2019","20"] , ["amy","2020","40.1"]]
    ...: d_temp = {}
    ...: for n, y, v in data:
    ...:     d_temp[(n,y)] = d_temp.setdefault((n,y),0)+float(v)
    ...: lol = [[n, y, v] for (n, y), v in d_temp.items()]
    ...: lol
Out[15]: 
[['john', '2019', 70.2],
 ['john', '2020', 50.3],
 ['amy', '2019', 80.0],
 ['amy', '2020', 40.1]]

score 0 · Answer 5 · answered Dec 18 '21 at 11:27

One option, using tools within the standard library:

from itertools import groupby
from decimal import Decimal
from operator import itemgetter

# itertools' groupby requires the data to be sorted
key_func = itemgetter(0,1)
df = sorted(df, key = key_func)

# compute values within the groupby
[[*key, str(sum(Decimal(e) for *_, e in ent))] 
  for key, ent 
  in groupby(df, key = key_func)]

[['amy', '2019', '80'],
 ['amy', '2020', '40.1'],
 ['john', '2019', '70.2'],
 ['john', '2020', '50.3']]

A loop that makes multi-conditional summations

5 Answers5