1

I have a 2-dimensional numpy array in python:

[[ 1  2  1  3  3]
 [10 20 30 40 60]]

I would like to have unique values in the first row and adding the corresponding values in the second row together before deleting the columns. So, the output for my array would be this:

[[  1   2   3 ]
 [ 40  20 100 ]]

I'm a newbie to python and I can't think of efficient way doing this for larger scales.

Justin Lange
  • 897
  • 10
  • 25

4 Answers4

3

Unfortunately, numpy doesn't have a built-in groupby function (though there are ways to write them). If you're open to using pandas, this would be straightforward:

import pandas as pd

>>> pd.DataFrame(a.T).groupby(0,as_index=False).sum().values.T

array([[  1,   2,   3],
       [ 40,  20, 100]])
sacuL
  • 49,704
  • 8
  • 81
  • 106
0
a = np.array([[ 1,  2,  1,  3,  3],
              [10, 20, 30, 40, 60]])

unique_values = np.unique(a[0])
new_array = np.zeros((2, len(unique_values)))
for i, uniq in enumerate(np.unique(a[0])):

    new_array[0][i] = uniq
    new_array[1][i] = np.where(a[0]==uniq,a[1],0).sum()
onno
  • 969
  • 5
  • 9
0

I don't think you'll get much more efficient than using a dictionary for the counts and then creating the array from that:

from collections import defaultdict
import numpy

sums = defaultdict(float)

arr = numpy.array([[ 1,  2,  1,  3,  3],
                   [10, 20, 30, 40, 60]]

for key, value in zip(*arr):
    sums[key] += value


numpy.array(list(sums.items())).T

returns

array([[  1.,   2.,   3.],
       [ 40.,  20., 100.]])
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
0

You can use a sparse.csr_matrix:

from scipy import sparse
b = a[0]
v = a[1]
m = b.max() + 1
s = v.shape[0]

res = sparse.csr_matrix((v, b, np.arange(s+1)), (s, m)).sum(0)

matrix([[  0,  40,  20, 100]], dtype=int32)

This shows the sum of every value from 0-a[0].max() in this case, so to link it back to your initial result:

t = np.unique(a[0])
np.stack((t, res.A1[t]))

array([[  1,   2,   3],
       [ 40,  20, 100]])
user3483203
  • 50,081
  • 9
  • 65
  • 94