Approach 1a
This can be done with np.lexsort
, np.unique
and np.ufunc.reduceat
:
A = A[np.lexsort((A[:, 0], A[:, 1]))]
_, idx = np.unique(A[:,:2], return_index = True, axis=0)
output = np.maximum.reduceat(A, idx)
Approach 1b
We could also increase speed of reduceat
in a slightly efficient way:
A = A[np.lexsort((A[:, 0], A[:, 1]))]
u, idx = np.unique(A[:, :2], return_index = True, axis=0)
return np.c_[u, np.maximum.reduceat(A[:,2], idx)]
Approach 2
If you want a simpler approach, you could also use numpy_indexed
which is written on top of numpy
and allows to solve grouping problems with less script:
import numpy_indexed as npi
_, idx = npi.group_by(A[:, :2]).argmax(A[:, 2])
output = A[idx]
Note that it outperforms previous approach which is a signal that it could be optimized further.
Approach 3
Some of pandas
methods works faster than numpy
. Seems that you got lucky with approach:
import pandas as pd
df = pd.DataFrame(A)
return df.loc[df.groupby([0,1])[2].idxmax()].values
Output
All the outputs were: np.array([[0. 0. 2.], [1. 1. 3.], [1. 2. 1.]])
, except the case of npi
approach that resulted in np.array([[0. 0. 2.], [1. 2. 1.], [1. 1. 3.]])
Update
You can optimize it even further if you perform the same algorithm on flatten array which is a result of dimensionality reduction. numexpr
package is able to reach extreme speed here. Names of new methods are: approach1a_on1D
, approach1b_on1D
, approach2_on1D
.
import numexpr as ne
def reduct_dims(cubes):
m0, m1 = np.min(cubes[:,:2], axis=0)
M0 = np.max(cubes[:,0], axis=0)
s0 = M0 - m0 + 1
d = {'c0':cubes[:,0],'c1':cubes[:,1],'c2':cubes[:,2],
's0':s0,'m0':m0, 'm1':m1}
return ne.evaluate('c0-m0+(c1-m1)*s0', d)
def approach1a(A):
A = A[np.lexsort((A[:, 0], A[:, 1]))]
_, idx = np.unique(A[:, :2], return_index = True, axis=0)
return np.maximum.reduceat(A, idx)
def approach1b(A):
A = A[np.lexsort((A[:, 0], A[:, 1]))]
u, idx = np.unique(A[:, :2], return_index = True, axis=0)
return np.c_[u, np.maximum.reduceat(A[:,2], idx)]
def approach2(A):
_, idx = npi.group_by(A[:, :2]).argmax(A[:, 2])
return A[idx]
def approach3(A):
df = pd.DataFrame(A)
return df.loc[df.groupby([0,1])[2].idxmax()].values
def approach1a_on1D(A):
A_as_1D = reduct_dims(A)
argidx = np.argsort(A_as_1D)
A, A_as_1D = A[argidx], A_as_1D[argidx] #sort both arrays
_, idx = np.unique(A_as_1D, return_index = True)
return np.maximum.reduceat(A, idx)
def approach1b_on1D(A):
A_as_1D = reduct_dims(A)
argidx = np.argsort(A_as_1D)
A, A_as_1D = A[argidx], A_as_1D[argidx]
_, idx = np.unique(A_as_1D, return_index = True)
return np.c_[A[:,:2][idx], np.maximum.reduceat(A[:,2], idx)]
def approach2_on1D(A):
A_as_1D = reduct_dims(A)
_, idx = npi.group_by(A_as_1D).argmax(A[:,2])
return A[idx]
%timeit reduct_dims(cubes)
160 ms ± 7.44 ms per loop (mean ± std. dev. of 7 runs, 10
loops each)
Analysis of performance with perfplot
I've tested efficiency of every method on realistic pointcloud data from lidar given as 3D values in centimeters (about 20M of points, 1M of them are distinct ones). The fastest version ran in 2 seconds. Let's see the results on perfplot
:
import tensorflow as tf
import perfplot
import matplotlib.pyplot as plt
from time import time
path = tf.keras.utils.get_file('cubes.npz', 'https://github.com/loijord/lidar_home/raw/master/cubes.npz')
cubes = np.load(path)['array'].astype(np.int64) // 50
t = time()
fig = plt.figure(figsize=(15, 10))
plt.grid(True, which="both")
out = perfplot.bench(
setup = lambda x: cubes[:x],
kernels = [approach1a, approach1b, approach2, approach3, approach1a_on1D, approach1b_on1D, approach2_on1D],
n_range = [2 ** k for k in range(22)],
xlabel = 'cubes[:n]',
title = 'Testing groupby max on cubes',
show_progress = False,
equality_check = False)
out.show()
print('Overall testing time:', time() -t)
# Overall testing time: 129.78826427459717
