0

I have a 3d point cloud (x,y,z) in a txt file. I want to calculate the 3d distance between each point and all the other points in the point cloud, and save the number of points having distance less than a threshold. I have done it in python in the shown code but it takes too much time. I was asking for a faster one than the one I got.

from math import sqrt
import numpy as np
points_list = []
with open("D:/Point cloud data/projection_test_data3.txt") as chp:
    for line in chp:
        x, y, z = line.split()
        points_list.append((float(x), float(y), float(z)))
j = 0
Final_density = 0
while j < len(points_list)-1:
    i = 0
    Density = 0
    while i < len (points_list) - 1 :
        if sqrt((points_list[i][0] - points_list[j][0])**2 + (points_list[i][1] - points_list[j][1])**2 + (points_list[i][2] - points_list[j][2])**2) < 0.15:
            Density += 1
        i += 1
    Final_density = Density
    with open("D:/Point cloud data/my_density.txt", 'a') as g:
        g.write("{}\n".format(str(Final_density)))
    j += 1

youssef
  • 103
  • 1
  • 8
  • 1
    How many times do you need to write the `Final_density` value? Just once, or for every distance that satisfies the criteria? – FCo Jul 21 '22 at 15:00
  • Looks like you tried to make it as slow as possible. How many points do you have? How are they distributed? Can you share them or some code to generate a similarly distributed cloud? – Kelly Bundy Jul 21 '22 at 15:01
  • final density represents the number of points having distance less then 0.15 for the current point in loop. so, for every point in the point cloud there is a final density value – youssef Jul 21 '22 at 15:02
  • Around 500,000 points – youssef Jul 21 '22 at 15:07
  • 2
    Is that code correct? Looks like you're comparing points with themselves and are never looking at the last point. – Kelly Bundy Jul 21 '22 at 15:13
  • No, I compare each time one point with the rest of points in the point cloud. – youssef Jul 21 '22 at 15:21

3 Answers3

4

One (quick) option that might speed this up is to change the position of the file writing/opening so that you're not opening/closing the file as many times.

from math import sqrt
import numpy as np
points_list = []
with open("D:/Point cloud data/projection_test_data3.txt") as chp:
    for line in chp:
        x, y, z = line.split()
        points_list.append((float(x), float(y), float(z)))
j = 0
Final_density = 0
with open("D:/Point cloud data/my_density.txt", 'a') as g:
    while j < len(points_list)-1:
        i = 0
        Density = 0
        while i < len (points_list) - 1 :
            if sqrt((points_list[i][0] - points_list[j][0])**2 + (points_list[i][1] - points_list[j][1])**2 + (points_list[i][2] - points_list[j][2])**2) < 0.15:
                Density += 1
            i += 1
        Final_density = Density
        g.write("{}\n".format(str(Final_density)))
        j += 1

Since it looks like you can use numpy, why not use it? You'll have to make sure the arrays are numpy arrays, but that should be simple.

if sqrt(...) < 0.15: to if np.linalg.norm(points_list[j] - points_list[i]) < 0.15:

This post (Finding 3d distances using an inbuilt function in python) has other ways to use a prebuilt function to get the 3d distance in python.

Edit thanks to @KellyBundy's comment:

You can also use np.linalg.norm(points_list - points_list[:, None], axis=1) to generate a matrix representing the distance between all points in the array. The diagonal will be 0 (corresponding to the distance between the given point and itself) and the matrix will be symmetric about the diagonal. You can use just the upper triangle to determine the distance between any given pair of points. Again, you'll have to modify your data structure to make all the points into a numpy array in the proper format (np.array([[point1x, point1y, point1z], [point2x, point2y, point2z]]), etc. (https://stackoverflow.com/a/46700369/2391458)

resulting matrix of the form:

[ [0   distance between points 0 and 1     distance between points 0 and 2......]
  [distance between points 1 and 0     0     distance between points 2 and 0....]
.....
FCo
  • 485
  • 5
  • 17
  • Shouldn't the numpy one use one point against the whole array of all points rather than still handling pairs of individual points? – Kelly Bundy Jul 21 '22 at 15:16
  • With their 500,000 points, all-vs-all is likely too large. I really meant one-vs-all (and loop for the "one"). Although they should probably also consider better algorithms (not just rely on numpy speed). – Kelly Bundy Jul 21 '22 at 15:30
  • @KellyBundy agreed. My answer is _not_ the optimal solution, just very easy suggestions to get a quick speed improvement (which may make this incomplete). I think the OP needs to benchmark and reconsider why this is being done and if there isn't already a better existing solution. – FCo Jul 21 '22 at 15:35
1

Quick x2 speed up – replace i = 0 with i = j + 1. That way you would check each pair only once, not twice.

More fundamental change – you can sort points by coordinates, and use sliding window algorithm. The idea is that if points are sorted by x coordinate, j-th point has x=1, and i-th point has x=1.01, then it might be near each other and you should check it. But if i-th point has x=2, then it cannot be near to j-th point, and since points are sorted, all points after i-th can be skipped (i.e. not checked in pair with j-th point).

If points are sparse, then it should significantly speed up function, and complexity would be O(n*log(n)) because of sorting.

iambackend
  • 41
  • 4
1

In the if, instead of taking the sqrt and comparing it with 0.15, compare it with square of 0.15 which is 0.0225 directly. The result will be the same. sqrt is an expensive operation, it will save you time to not use it.

if (points_list[i][0] - points_list[j][0])**2 + (points_list[i][1] - points_list[j][1])**2 + (points_list[i][2] - points_list[j][2])**2 < 0.0225:
Nuri Tasdemir
  • 9,720
  • 3
  • 42
  • 67
  • How expensive is it? – Kelly Bundy Jul 21 '22 at 17:27
  • Apparently not expensive at all. Impact of this improvement appears to be far less (saves ~40 ns, ~6%) than for example assigning `points_list[j]` to `x, y, z` and using those instead (saves ~120 ns, ~19%). – Kelly Bundy Jul 21 '22 at 18:19
  • [Benchmark](https://tio.run/##zZLBbsIwDIbvfgrfkrDQpWFMGxpPUlWIjTCC1iRLgwS8fBeHISHEOK8HN/r1xf4dOxzSxrvJS4jDsI6@w2Q7YxPaLviYMJpglgmgN2kXcI6MMShYt0ybM9R/xwTBW5f6xZftU@YarqpaopI6h2oiJGbhic5TCs@iBZsxBdsca9hLPEg85vNFmmbbAtWDD78yPSUFREbFOL/EbNuoFsdXV7MmRiOND3gN1zfg@i9Y34B1gQW@5U7qKZNk6z84KoaU1idLd99qf8/J4V7l41XvLcDaR1ygdRiX7tPwiZjl8iTS4EgvA5wVT27XvZtIQ1f0FY0WprOOn5aNEy2xbJz85eennxD4@KuUiyFmfzz6nVvxhCOszWveNOZ6xKqqWG6BcjVjrWatgDMvhuEH). – Kelly Bundy Jul 21 '22 at 18:19
  • @KellyBundy you could use both optimizations at the same time. They are not exclusive – Nuri Tasdemir Jul 21 '22 at 19:21
  • Yes, and others as well. Just tried those two separately to compare their own impact. My main objective was to find out whether sqrt is actually expensive, measuring that one other optimization was just a little extra anyway. – Kelly Bundy Jul 21 '22 at 19:28