4

Suppose I have a numpy array (shape: Nx3, each row is (x, y, z)) of a point cloud. And I want to generate the heightmap from the point cloud (project to the xy plane). For each grid on the xy plane, I only want to keep the maximum z value from all the points that are projected to that grid.

For example,

A=np.array([[0, 0, 1],
            [0, 0, 1.5],
            [0, 0, 2.0],
            [1, 1, 1],
            [1, 2, 1],
            [1, 1, 3]])

Then the output I am expecting is

B=np.array([[0, 0, 2.0],
             [1, 1, 3],
             [1, 2, 1]])

How can I do this operation efficiently in numpy? Thanks.

tczj
  • 438
  • 4
  • 17

1 Answers1

2

Approach 1a

This can be done with np.lexsort, np.unique and np.ufunc.reduceat:

A = A[np.lexsort((A[:, 0], A[:, 1]))]
_, idx = np.unique(A[:,:2], return_index = True, axis=0)
output = np.maximum.reduceat(A, idx)

Approach 1b

We could also increase speed of reduceat in a slightly efficient way:

A = A[np.lexsort((A[:, 0], A[:, 1]))]
u, idx = np.unique(A[:, :2], return_index = True, axis=0)
return np.c_[u, np.maximum.reduceat(A[:,2], idx)]

Approach 2

If you want a simpler approach, you could also use numpy_indexed which is written on top of numpy and allows to solve grouping problems with less script:

import numpy_indexed as npi
_, idx = npi.group_by(A[:, :2]).argmax(A[:, 2])
output = A[idx]

Note that it outperforms previous approach which is a signal that it could be optimized further.

Approach 3

Some of pandas methods works faster than numpy. Seems that you got lucky with approach:

import pandas as pd
df = pd.DataFrame(A)
return df.loc[df.groupby([0,1])[2].idxmax()].values

Output

All the outputs were: np.array([[0. 0. 2.], [1. 1. 3.], [1. 2. 1.]]), except the case of npi approach that resulted in np.array([[0. 0. 2.], [1. 2. 1.], [1. 1. 3.]])

Update

You can optimize it even further if you perform the same algorithm on flatten array which is a result of dimensionality reduction. numexpr package is able to reach extreme speed here. Names of new methods are: approach1a_on1D, approach1b_on1D, approach2_on1D.

import numexpr as ne

def reduct_dims(cubes):
    m0, m1 = np.min(cubes[:,:2], axis=0)
    M0 = np.max(cubes[:,0], axis=0)
    s0  = M0 - m0 + 1
    d = {'c0':cubes[:,0],'c1':cubes[:,1],'c2':cubes[:,2],
         's0':s0,'m0':m0, 'm1':m1}
    return ne.evaluate('c0-m0+(c1-m1)*s0', d)

def approach1a(A):
    A = A[np.lexsort((A[:, 0], A[:, 1]))]
    _, idx = np.unique(A[:, :2], return_index = True, axis=0)
    return np.maximum.reduceat(A, idx)

def approach1b(A):
    A = A[np.lexsort((A[:, 0], A[:, 1]))]
    u, idx = np.unique(A[:, :2], return_index = True, axis=0)
    return np.c_[u, np.maximum.reduceat(A[:,2], idx)]

def approach2(A):
    _, idx = npi.group_by(A[:, :2]).argmax(A[:, 2])
    return A[idx]

def approach3(A):
    df = pd.DataFrame(A)
    return df.loc[df.groupby([0,1])[2].idxmax()].values

def approach1a_on1D(A):
    A_as_1D = reduct_dims(A)
    argidx = np.argsort(A_as_1D)
    A, A_as_1D = A[argidx], A_as_1D[argidx] #sort both arrays
    _, idx = np.unique(A_as_1D, return_index = True)
    return np.maximum.reduceat(A, idx)

def approach1b_on1D(A):
    A_as_1D = reduct_dims(A)
    argidx = np.argsort(A_as_1D)
    A, A_as_1D = A[argidx], A_as_1D[argidx]
    _, idx = np.unique(A_as_1D, return_index = True)
    return np.c_[A[:,:2][idx], np.maximum.reduceat(A[:,2], idx)]

def approach2_on1D(A):
    A_as_1D = reduct_dims(A)
    _, idx = npi.group_by(A_as_1D).argmax(A[:,2])
    return A[idx]

%timeit reduct_dims(cubes)

160 ms ± 7.44 ms per loop (mean ± std. dev. of 7 runs, 10 
loops each)

Analysis of performance with perfplot

I've tested efficiency of every method on realistic pointcloud data from lidar given as 3D values in centimeters (about 20M of points, 1M of them are distinct ones). The fastest version ran in 2 seconds. Let's see the results on perfplot:

import tensorflow as tf
import perfplot
import matplotlib.pyplot as plt
from time import time
path = tf.keras.utils.get_file('cubes.npz', 'https://github.com/loijord/lidar_home/raw/master/cubes.npz')
cubes = np.load(path)['array'].astype(np.int64) // 50
t = time()
fig = plt.figure(figsize=(15, 10))
plt.grid(True, which="both")
out = perfplot.bench(
        setup = lambda x: cubes[:x],
        kernels = [approach1a, approach1b, approach2, approach3, approach1a_on1D, approach1b_on1D, approach2_on1D],
        n_range = [2 ** k for k in range(22)],
        xlabel = 'cubes[:n]',
        title = 'Testing groupby max on cubes',
        show_progress = False,
        equality_check = False)
out.show()
print('Overall testing time:', time() -t)
# Overall testing time: 129.78826427459717

enter image description here

mathfux
  • 5,759
  • 1
  • 14
  • 34
  • @tczj I believe this can be optimized further. You might also like to see another approaches for 2D scenario in this [question](https://stackoverflow.com/a/63758412/3044825) – mathfux Oct 11 '20 at 16:57