scipy.sparse
matrix takes this kind of information, but for just 2d
sparse.coo_matrix((data, (row, col)))
where row
and col
are indices like your X
,Y
and Z
. It sums duplicates.
The first step to doing that is a lexical
sort of the indices. That puts points with matching coordinates next to each other.
The actually grouping and summing is done, I believe, in compiled code. Part of the difficulty in doing that fast in numpy
terms is that there will be a variable number of elements in each group. Some will be unique, others might have 3 or more.
Python itertools
has a groupby
tool. Pandas also has grouping functions. I can also imagine using a default_dict
to group and sum values.
The ufunc
reduceat
might also work, though it's easier to use in 1d than 2 or 3.
If you are ignoring the Z
, the sparse coo_matrix
approach may be easiest.
In [2]: X=np.array([13,9,15,13,13,15])
In [3]: Y=np.array([7,2,3,7,7,3])
In [4]: A=np.array([1.5,0.5,1.1,0.9,1.7,1.1])
In [5]: M=sparse.coo_matrix((A,(X,Y)))
In [15]: M.sum_duplicates()
In [16]: M.data
Out[16]: array([ 0.5, 2.2, 4.1])
In [17]: M.row
Out[17]: array([ 9, 15, 13])
In [18]: M.col
Out[18]: array([2, 3, 7])
In [19]: M
Out[19]:
<16x8 sparse matrix of type '<class 'numpy.float64'>'
with 3 stored elements in COOrdinate format>
Here's what I had in mind with lexsort
In [32]: Z=np.array([21,7,9,21,21,9])
In [33]: xyz=np.stack([X,Y,Z],1)
In [34]: idx=np.lexsort([X,Y,Z])
In [35]: idx
Out[35]: array([1, 2, 5, 0, 3, 4], dtype=int32)
In [36]: xyz[idx,:]
Out[36]:
array([[ 9, 2, 7],
[15, 3, 9],
[15, 3, 9],
[13, 7, 21],
[13, 7, 21],
[13, 7, 21]])
In [37]: A[idx]
Out[37]: array([ 0.5, 1.1, 1.1, 1.5, 0.9, 1.7])
When sorted like this it becomes more evident that the Z
coordinate is 'redundant', at least for this purpose.
Using reduceat
to sum groups:
In [40]: np.add.reduceat(A[idx],[0,1,3])
Out[40]: array([ 0.5, 2.2, 4.1])
(for now I just eyeballed the [0,1,3] list)