I wouldn't say this is duplicate but related question you mentioned is a good point to start with. A majority of answers of your link requires to sort array, extract indices where groups begin and then call np.split
on it. That's not a case here because it would return a list of groups that are not balanced in size.
Instead you can use np.bincount
method. It counts number of occurrences of each weighted value and this is actually the same as groupby sum, only group keys are absent from output.
def group_by_sum(x):
u, idx = np.unique(x[:,0], return_inverse=True)
s = np.bincount(idx, weights = x[:,1])
return np.c_[u, s]
Bonus. It's actually a oneliner in numpy_indexed
package:
np.transpose(npi.group_by(x[:, 0]).sum(x[:, 1]))
Benchmarking
import numpy as np
import perfplot
import matplotlib.pyplot as plt
def bincount(x):
u, idx = np.unique(x[:,0], return_inverse=True)
s = np.bincount(idx, weights = x[:,1])
return np.c_[u, s]
def reduceat(x):
x = x[np.argsort(x[:, 0])]
i = np.flatnonzero(np.diff(x[:, 0]))
i = np.r_[0, i + 1]
s = np.add.reduceat(x[:, 1], i)
return np.stack((x[i, 0], s), axis=-1)
def setup(N, s):
x = np.linspace(0,1,N+1)[np.random.randint(N, size = s)]
return np.c_[x, (x**2)%1]
def build_args(k):
return {'setup': lambda x: setup(k, x),
'kernels': [bincount, reduceat],
'n_range': [2**k for k in range(1, 20)],
'title': f'Testing for x samples in [0, 1] with no more than {k} groups',
'show_progress': True,
'equality_check': False}
outs = [perfplot.bench(**build_args(n)) for n in (10, 100, 1000, 10000)]
fig = plt.figure(figsize=(20, 20))
for i in range(len(outs)):
ax = fig.add_subplot(2, 2, i + 1)
ax.grid(True, which="both")
outs[i].plot()
plt.show()
