Create unique row in 2D numpy array by adding corresponding values

Question

I have a 2-dimensional numpy array in python:

[[ 1  2  1  3  3]
 [10 20 30 40 60]]

I would like to have unique values in the first row and adding the corresponding values in the second row together before deleting the columns. So, the output for my array would be this:

[[  1   2   3 ]
 [ 40  20 100 ]]

I'm a newbie to python and I can't think of efficient way doing this for larger scales.

sacuL · Accepted Answer · 2018-10-30T22:45:13.050

3

Unfortunately, numpy doesn't have a built-in groupby function (though there are ways to write them). If you're open to using pandas, this would be straightforward:

import pandas as pd

>>> pd.DataFrame(a.T).groupby(0,as_index=False).sum().values.T

array([[  1,   2,   3],
       [ 40,  20, 100]])

edited Oct 30 '18 at 22:45

answered Oct 30 '18 at 15:46

sacuL

49,704
8
81
106

1

Instead of transposing: `pd.DataFrame(a[1:], columns=a[0]).groupby(level=0, axis=1).sum().values` – user3483203 Oct 30 '18 at 16:03

score 0 · Answer 2 · answered Oct 30 '18 at 15:45

a = np.array([[ 1,  2,  1,  3,  3],
              [10, 20, 30, 40, 60]])

unique_values = np.unique(a[0])
new_array = np.zeros((2, len(unique_values)))
for i, uniq in enumerate(np.unique(a[0])):

    new_array[0][i] = uniq
    new_array[1][i] = np.where(a[0]==uniq,a[1],0).sum()

score 0 · Answer 3 · answered Oct 30 '18 at 15:45

I don't think you'll get much more efficient than using a dictionary for the counts and then creating the array from that:

from collections import defaultdict
import numpy

sums = defaultdict(float)

arr = numpy.array([[ 1,  2,  1,  3,  3],
                   [10, 20, 30, 40, 60]]

for key, value in zip(*arr):
    sums[key] += value


numpy.array(list(sums.items())).T

returns

array([[  1.,   2.,   3.],
       [ 40.,  20., 100.]])

score 0 · Answer 4 · answered Oct 30 '18 at 15:59

You can use a sparse.csr_matrix:

from scipy import sparse
b = a[0]
v = a[1]
m = b.max() + 1
s = v.shape[0]

res = sparse.csr_matrix((v, b, np.arange(s+1)), (s, m)).sum(0)

matrix([[  0,  40,  20, 100]], dtype=int32)

This shows the sum of every value from 0-a[0].max() in this case, so to link it back to your initial result:

t = np.unique(a[0])
np.stack((t, res.A1[t]))

array([[  1,   2,   3],
       [ 40,  20, 100]])

Create unique row in 2D numpy array by adding corresponding values

4 Answers4