I've been working with image transformations recently and came to a situation where I have a large array (shape of 100,000 x 3) where each row represents a point in 3D space like:
pnt = [x y z]
All I'm trying to do is iterating through each point and matrix multiplying each point with a matrix called T (shape = 3 X 3).
Test with Numpy:
def transform(pnt_cloud, T):
i = 0
for pnt in pnt_cloud:
xyz_pnt = np.dot(T, pnt)
if xyz_pnt[0] > 0:
arr[i] = xyz_pnt[0]
i += 1
return arr
Calling the following code and calculating runtime (using %time) gives the output:
Out[190]: CPU times: user 670 ms, sys: 7.91 ms, total: 678 ms
Wall time: 674 ms
Test with Pytorch Tensor:
import torch
tensor_cld = torch.tensor(pnt_cloud)
tensor_T = torch.tensor(T)
def transform(pnt_cloud, T):
depth_array = torch.tensor(np.zeros(pnt_cloud.shape[0]))
i = 0
for pnt in pnt_cloud:
xyz_pnt = torch.matmul(T, pnt)
if xyz_pnt[0] > 0:
depth_array[i] = xyz_pnt[0]
i += 1
return depth_array
Calling the following code and calculating runtime (using %time) gives the output:
Out[199]: CPU times: user 6.15 s, sys: 28.1 ms, total: 6.18 s
Wall time: 6.09 s
NOTE: Doing the same with torch.jit only reduces 2s
I would have thought that PyTorch tensor computations would be much faster due to the way PyTorch breaks its code down in the compiling stage. What am I missing here?
Would there be any faster way to do this other than using Numba?