2

I have a numpy array based around this format [[x,y],[x,y]...] and with this I would like to combine the y values where the x's are the same

Example array = [[0,0],[1,1],[2,4],[4,6],[2,2],[3,7],[1,9],[4,16],[5,1],[5,2],[0,0]]

I would like this to become newArray = [[0,0],[1,10],[2,6],[3,7],[4,22],[5,3]] - it doesn't have to be ordered

As if now i can't think of a way to do this simply and efficiently, it might help to add in my actual array uses timestamps as my x value such as Timestamp('2018-05-05 00:00:00') and is 183083 in size which isn't too bad.

Any help appricated!

  • Did one of the solutions below help? Feel free to accept one if it did (green tick on left), or ask for clarification. – jpp May 11 '18 at 13:14

3 Answers3

1

Pure numpy solutions are available, if performance is an issue: Sum array by number in numpy

Below is a dictionary-based approach using collections.defaultdict. This works by iterating each row in your array and summing values by key.

import numpy as np
from collections import defaultdict

A = np.array([[0,0],[1,1],[2,4],[4,6],[2,2],[3,7],[1,9],[4,16],[5,1],[5,2],[0,0]])

d = defaultdict(int)
for i, j in A:
    d[i] += j

res = np.array(sorted(d.items()))

print(res)

array([[ 0,  0],
       [ 1, 10],
       [ 2,  6],
       [ 3,  7],
       [ 4, 22],
       [ 5,  3]])
jpp
  • 159,742
  • 34
  • 281
  • 339
1

Here is an example using the collections.Counter

import numpy as np
from collections import Counter

ar = np.array([[0,0],[1,1],[2,4],[4,6],[2,2],[3,7],[1,9],[4,16],[5,1],[5,2],[0,0], [20,0]])

repeated = [item for item, count in Counter(ar[:,0]).iteritems() if count > 1]
non_repeated = [item for item in range(len(ar)) if item not in repeated]

new_arr = []
for element in repeated:
    new_arr.append(np.sum(ar[np.where(ar[:,0]==element)],axis=0))
new_arr = np.asanyarray(new_arr)
new_arr[:,0] = new_arr[:,0]/2.
new_arr = ar[non_repeated]
irigo
  • 11
  • 1
0

This is a typical grouping-operation. numoy does not support these cleanly out of the box, but the numpy-indexed package does (disclaimer: I am its author):

import numpy_indexed as npi
keys, sums = npi.group_by(A[:, 0]).sum(A[:, 1])

This solution works in a fully vectorized manner; so no for loops over the array in python, and also generalizes to a lot of other scenarios. It can be installed using pip or conda.

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42