Numpy combining values within a 2-D array based on index

Question

I have a numpy array based around this format [[x,y],[x,y]...] and with this I would like to combine the y values where the x's are the same

Example array = [[0,0],[1,1],[2,4],[4,6],[2,2],[3,7],[1,9],[4,16],[5,1],[5,2],[0,0]]

I would like this to become newArray = [[0,0],[1,10],[2,6],[3,7],[4,22],[5,3]] - it doesn't have to be ordered

As if now i can't think of a way to do this simply and efficiently, it might help to add in my actual array uses timestamps as my x value such as Timestamp('2018-05-05 00:00:00') and is 183083 in size which isn't too bad.

Any help appricated!

Did one of the solutions below help? Feel free to accept one if it did (green tick on left), or ask for clarification. — jpp, May 11 '18 at 13:14

score 1 · Answer 1 · answered May 07 '18 at 10:56

Pure numpy solutions are available, if performance is an issue: Sum array by number in numpy

Below is a dictionary-based approach using collections.defaultdict. This works by iterating each row in your array and summing values by key.

import numpy as np
from collections import defaultdict

A = np.array([[0,0],[1,1],[2,4],[4,6],[2,2],[3,7],[1,9],[4,16],[5,1],[5,2],[0,0]])

d = defaultdict(int)
for i, j in A:
    d[i] += j

res = np.array(sorted(d.items()))

print(res)

array([[ 0,  0],
       [ 1, 10],
       [ 2,  6],
       [ 3,  7],
       [ 4, 22],
       [ 5,  3]])

score 1 · Answer 2 · answered May 07 '18 at 11:16

Here is an example using the collections.Counter

import numpy as np
from collections import Counter

ar = np.array([[0,0],[1,1],[2,4],[4,6],[2,2],[3,7],[1,9],[4,16],[5,1],[5,2],[0,0], [20,0]])

repeated = [item for item, count in Counter(ar[:,0]).iteritems() if count > 1]
non_repeated = [item for item in range(len(ar)) if item not in repeated]

new_arr = []
for element in repeated:
    new_arr.append(np.sum(ar[np.where(ar[:,0]==element)],axis=0))
new_arr = np.asanyarray(new_arr)
new_arr[:,0] = new_arr[:,0]/2.
new_arr = ar[non_repeated]

score 0 · Answer 3 · answered May 07 '18 at 11:12

This is a typical grouping-operation. numoy does not support these cleanly out of the box, but the numpy-indexed package does (disclaimer: I am its author):

import numpy_indexed as npi
keys, sums = npi.group_by(A[:, 0]).sum(A[:, 1])

This solution works in a fully vectorized manner; so no for loops over the array in python, and also generalizes to a lot of other scenarios. It can be installed using pip or conda.

Numpy combining values within a 2-D array based on index

3 Answers3