This task is deceptively difficult to do without loops! You'll need a bunch of high-level numpy
tricks to make it work. I sort of fly through them here, but I will try to link to other resources where I can.
From here, the best way to do row-wise comparison is:
a = np.array([[0,1,1,2,2],
[0,1,1,2,2],
[0,2,2,2,2]])
b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
b
array([[[0 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0]],
[[0 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0 0 0]],
[[0 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0 2 0 0 0]]],
dtype='|V20')
b.shape
(3, 1)
Notice that the innermost brackets are not an additional dimension, but an np.void
object that can be compared with things like np.unique
.
Still, getting the indices you want to keep isn't really easy, but here's the one-liner:
c = np.flatnonzero(np.r_[1, np.diff(np.unique(b, return_inverse = 1)[1])])
Eech. It's kinda messy. Basically you're looking for the indices where the lines change, and the first line. Normally you wouldn't need the np.unique
call and could just do np.diff(b)
, but you can't subtract np.void
. np.r_
is a shortcut for np.concatenate
that's a bit more readable. And np.flatnonzero
gives you the indices where your new array isn't zero (i.e. the indices you want to keep)
c
array([0, 2], dtype=int32)
There, now you can use some fancy ufunc.reduceat
math to do your addition:
d = np.add.reduceat(a, c, axis = 0)
d
array([[0, 2, 2, 4, 4],
[0, 2, 2, 2, 2]], dtype=int32)
OK, now to add the zeros, we'll just plug that into an np.zero
array using advanced indexing
e = np.zeros_like(a)
e[c] = d
e
array([[0, 2, 2, 4, 4],
[0, 0, 0, 0, 0],
[0, 2, 2, 2, 2]])
And there we go! You can go in other directions by transposing or flipping the matrix at the beginning and the end.
def reduce_duplicates(a):
b = np.ascontiguousarray(a).view(np.dtype((np.void, a.dtype.itemsize * a.shape[1])))
c = np.flatnonzero(np.r_[1, np.diff(np.unique(b, return_inverse = 1)[1])])
d = np.add.reduceat(a, c, axis = 0)
e = np.zeros_like(a)
e[c] = d
return e
reduce_duplicates(a.T[::-1,:])[::-1,:].T #reducing right
array([[0, 0, 2, 0, 4],
[0, 0, 2, 0, 4],
[0, 0, 4, 0, 4]])
I don't have numba
so I can't test speed against the other suggestion (knowing numba
it is probably slower), but it is loopless and numpy.