0

Let's say I have an ND array in python represented by the following scheme:

["Event ID", "Event Location", "Event Cost"]
data = \
[[1, 0, 500]
[1, 0, 250]
[1, 1, 300]
[2, 0, 750]
[2, 1, 400]
[2, 1, 500]]

How can I collapse this array to sum up the cost for entries with the same event ID that happened in the same event location? This would give me the following array at the end:

[[1, 0, 750]
[1, 1, 300]
[2, 0, 750]
[2, 1, 900]]
Adam G.
  • 75
  • 11

3 Answers3

1

This is a classic use-case for itertools.groupby:

import itertools

result = [
    [i, loc, sum(cost for _, _, cost in costs)]
    for (i, loc), costs in itertools.groupby(data, key=lambda t: (t[0], t[1]))
]
kaya3
  • 47,440
  • 4
  • 68
  • 97
1

I prefer two ways to do it:

Using numpy_indexed package:

import numpy as np
import numpy_indexed as npi
a = np.array([[1, 0, 500],[1, 0, 250],[1, 1, 300],[2, 0, 750],[2, 1, 400],[2, 1, 500]])
_, sums = npi.group_by(a[:,:2]).sum(a[:,2])
result = np.hstack([_, np.vstack(sums)])
print(result)

Output:

_ = 
[[1 0], 
[1 1], 
[2 0], 
[2 1]]
sums = [750, 300, 750, 900]
np.vstack(sums) = 
[[750]
 [300]
 [750]
 [900]]
result = 
[[  1   0 750]
 [  1   1 300]
 [  2   0 750]
 [  2   1 900]]

Using pandas:

df.groupby([0,1]).sum().reset_index().values
mathfux
  • 5,759
  • 1
  • 14
  • 34
0

I used Pandas and the following line to solve this:

dg = data.groupby(['Event ID', 'Event Location'])['Event Cost'].sum().reset_index()
Adam G.
  • 75
  • 11