1

I have a very large 3D image stored in a matrix (approx. 500x500x40 voxels) in Matlab. In this matrix about 30000 points are selected (suppose randomly). Suppose the selected voxels have a value of one and the non-selected points are zero. Now I need to calculate for every voxel in the entire 3D-image the Euclidian distance to the nearest selected point.

So for example in 2D, given a 4x4matrix:

    selection = 0 0 1 0
                1 0 0 0
                0 0 0 1
                0 0 0 0

The corresponding distance matrix would be:

    distance = 1  1  0  1
               0  1  1  1
               1  √2 1  0
               2  √5 √2 1

Is there an efficient way to do this, in terms of both time and memory?

sdbonte
  • 102
  • 12
  • How do you define your distance? Euclidian? Are the 3 dimensions of your matrix representing coordinates in 3 directions? – BillBokeey Jun 06 '16 at 12:14
  • Distance is indeed Euclidian. The 3D matrix is a medical scan, so every "point" in the matrix is in fact a voxel. – sdbonte Jun 06 '16 at 12:33
  • There are many possibilities, see i.e.: https://en.wikipedia.org/wiki/Nearest_neighbor_search . I was thinking about octree's, but I have no relevant experience with it. – Bernhard Jun 06 '16 at 13:13
  • Reshape your matrix into a `250000 x 40` matrix, then sample from this matrix to get a `30000 x 40` matrix of points, then use the above marked duplicate. – rayryeng Jun 06 '16 at 15:00
  • I don't think this will work. I need for every voxel in the matrix (so 500*500*40 = 10M samples) to calculate the distance to the nearest of 30k points. Using the method you mention would require to store a 10M*30k matrix, which is not possible. – sdbonte Jun 06 '16 at 15:24
  • 1
    @sdbonte Right it didn't hit me about the memory storage until you mentioned it and I apologize for not seeing it properly before flagging. I'll remove the duplicate. I also thought that each 3D slice at one spatial coordinate was a data point, not the actual voxel itself. Because of the huge memory requirement, you may have to resort to storing on disk or splitting up the operation over multiple processors / machines. Either way, it's going to be rather slow. – rayryeng Jun 06 '16 at 15:37

2 Answers2

1

If you don't need to know which combinations had which distances, then you could calculate the 3D cross-correlation.

To illustrate this in 2D, take the following matrix and calculate the 2D correlation as described in the reference

M =
 0     0     0
 1     0     0
 1     0     0

convn(v,v(end:-1:1,end:-1:1)) =
 0     0     0     0     0
 0     0     1     0     0
 0     0     2     0     0
 0     0     1     0     0
 0     0     0     0     0

One can read off the distances between the ones because the correlation matrix here can be understood as the differences in indices. The central column in convn means the horizontal distance is zero. Likewise, the middle row means zero vertical distance. As a result, the central value gives you the amount of zero distance values, which is the sum of ones in your matrix. The two ones correspond to a vertical distance of 1 between the ones in M. One combination has a positive distance and the other combination has a negative distance.

As a result, you now have all the distances, but they contain the horizontal and vertical directions too. But you can still process that as you wish.

A way to calculate all possible squared distances without a for loop would be

n = size(M,1) = 
 3

tmp = repmat([-(n-1):(n-1)].^2,2*n-1,1)
d2 = tmp+tmp' =
 8     5     4     5     8
 5     2     1     2     5
 4     1     0     1     4
 5     2     1     2     5
 8     5     4     5     8

Both matrices together contain basically the histogram of distances.

Community
  • 1
  • 1
Hans
  • 46
  • 4
  • Thanks for the advice, but how would one use this in practice? I'm indeed not interested in the combinations. But I don't really see how I can use the `convn` command for this problem. – sdbonte Jun 06 '16 at 12:48
  • I don't think this will work. For instance, for the selected point, given in matrix M: `M = [0 0 0; 1 0 0; 1 0 0]`, the output would be: `distance = [1 sqrt(2) sqrt(3); 0 1 2; 0 1 2]`. If I use the `convn` command, I still would need to calculate the Euclidian distances myself, and perform a for-loop over all 10M voxels, which would be very slow. – sdbonte Jun 07 '16 at 08:11
0

If your points are given as coordinates X = n x 3, you can use

D = pdist(X,'euclidean')

to efficiently calculate all combinations of distances.

Hans
  • 46
  • 4
  • If I store all my voxels (10M) and selected points (30k) in a vector `X` of size `(10M+30k) x 3` and run the `pdist`command, this would result in a matrix of size `10M*(10M-1)/2 = 5e13`, which gives ofcourse "Out of memory" issues. – sdbonte Jun 07 '16 at 10:08