How to sum an ND array in python based on like entries?

Question

Let's say I have an ND array in python represented by the following scheme:

["Event ID", "Event Location", "Event Cost"]
data = \
[[1, 0, 500]
[1, 0, 250]
[1, 1, 300]
[2, 0, 750]
[2, 1, 400]
[2, 1, 500]]

How can I collapse this array to sum up the cost for entries with the same event ID that happened in the same event location? This would give me the following array at the end:

[[1, 0, 750]
[1, 1, 300]
[2, 0, 750]
[2, 1, 900]]

Does this answer your question? [Is there any numpy group by function?](https://stackoverflow.com/questions/38013778/is-there-any-numpy-group-by-function) — norok2, Dec 11 '19 at 00:27
@MadPhysicist answered my question. Pandas worked perfectly. — Adam G., Dec 11 '19 at 21:24

score 1 · Answer 1 · answered Dec 10 '19 at 23:14

1

This is a classic use-case for itertools.groupby:

import itertools

result = [
    [i, loc, sum(cost for _, _, cost in costs)]
    for (i, loc), costs in itertools.groupby(data, key=lambda t: (t[0], t[1]))
]

answered Dec 10 '19 at 23:14

kaya3

47,440
4
68
97

Is it still a classic use case if you're grouping over a numpy array (I don't think so...). – cs95 Dec 10 '19 at 23:15
1

I didn't say it doesn't work, just that it's not very performant. – cs95 Dec 10 '19 at 23:18

score 1 · Answer 2 · answered Dec 11 '19 at 00:25

I prefer two ways to do it:

Using `numpy_indexed` package:

import numpy as np
import numpy_indexed as npi
a = np.array([[1, 0, 500],[1, 0, 250],[1, 1, 300],[2, 0, 750],[2, 1, 400],[2, 1, 500]])
_, sums = npi.group_by(a[:,:2]).sum(a[:,2])
result = np.hstack([_, np.vstack(sums)])
print(result)

Output:

_ = 
[[1 0], 
[1 1], 
[2 0], 
[2 1]]
sums = [750, 300, 750, 900]
np.vstack(sums) = 
[[750]
 [300]
 [750]
 [900]]
result = 
[[  1   0 750]
 [  1   1 300]
 [  2   0 750]
 [  2   1 900]]

Using `pandas`:

df.groupby([0,1]).sum().reset_index().values

score 0 · Accepted Answer · answered Dec 11 '19 at 21:26

0

I used Pandas and the following line to solve this:

dg = data.groupby(['Event ID', 'Event Location'])['Event Cost'].sum().reset_index()

answered Dec 11 '19 at 21:26

Adam G.

75
11

How to sum an ND array in python based on like entries?

3 Answers3

Using numpy_indexed package:

Using pandas:

Using `numpy_indexed` package:

Using `pandas`: