Can this code be vectorised further to eliminate loop?

Question

I am working on a ray-tracing geometry problem in MATLAB and have reached a bottleneck in my program.

The function takes in the start and end points of a ray (lStart and lEnd), a set of plane-points and normals (pPoint and norms). The function then computes the distance along the ray at which it intersects each of the planes.

Here is a reference to the original equation: https://en.wikipedia.org/wiki/Line%E2%80%93plane_intersection#Algebraic_form

The code I have so far is as follows:

dists = (diag(bsxfun(@minus, pPoint, lStart) * norms')) ./ ((lEnd - lStart) * norms')';

Which was called in a loop such as:

nRays   = size(lStart, 1);
nPlanes = size(pPoint, 1);
dists   = zeros(nPlanes, nRays);

for rayCtr = 1:nRays

    dists(:, rayCtr) = (diag(bsxfun(@minus, pPoint, lStart(rayCtr, :)) * norms')) ./...
         ((lEnd(rayCtr, :) - lStart(rayCtr, :)) * norms')';

end

This works perfectly well for a single ray.

Given one ray [1 x 3] and 300 planes [300 x 3], I get a [300 x 1] matrix with the distance of each plane intersection.

What I am struggling with is, vectorising this to work on a list of rays.

Sizes in a typical dataset are:

lStart, lEnd  = [14e6, 3];
pPoint, norms = [300,  3];

The ray processing is usually batched into tens of thousands - to fit in memory. For each batch, I'd like to eliminate the rayCtr loop. With this method the entire program takes just over 8 hours (32-bit, Windows, 2GB RAM).

Here are some coordinates for six planes (forming a cuboid) and two rays as a MWE:

pPoint = [-0.5000   -0.5000   -0.5000;
          -0.5000   -0.5000    0.5000;
          -0.5000   -0.5000   -0.5000;
          -0.5000    0.5000   -0.5000;
          -0.5000   -0.5000   -0.5000;
           0.5000   -0.5000   -0.5000]

norms = [0  0   1;
         0  0   1;
         0  1   0;
         0  1   0;
         1  0   0;
         1  0   0]

lStart = [-1 0 0;
          -1 0.25 0]

lEnd   = [1 0 0;
          1 0.25 0]

The expected output from the example is:

dists = [-Inf -Inf;
          Inf  Inf; 
         -Inf -Inf; 
          Inf  Inf; 
          0.25 0.25; 
          0.75 0.75]

Many thanks.

UPDATE: With the solutions proposed in the accepted answer, runtime is down to approximately 30 mins - now limited by read-write operations and voxel lookup.

Can you give a small numerical example? Which are the 1x3 and 300x3 variables? — Luis Mendo, Nov 09 '15 at 13:12
Can you add a for-loop code to solve it, so that people could verify back any optimization possible? — Divakar, Nov 10 '15 at 20:28
@Divakar Sorry, should I add in the for loop I was using originally to show how the code previously worked? — Adam Sroka, Nov 11 '15 at 09:33
@AdamSroka Yes, it would be great if you can add the code that you used before vectorization to run through the entire dataset. — Divakar, Nov 11 '15 at 09:52
Can you also add the sizes of the variables `pPoint`, `norms`, `lStart` and `lEnd` as used in the actual dataset? — Divakar, Nov 11 '15 at 11:00

score 11 · Accepted Answer · edited May 23 '17 at 12:10

11

I think what you need is

dists=sum(bsxfun(@times,bsxfun(@minus,...
                               permute(pPoint,[1 3 2]),permute(lStart,[3 1 2])),...
                 permute(norms,[1 3 2])),3)...
        ./(sum(bsxfun(@times,...
                      permute(lEnd-lStart,[3 1 2]),permute(norms,[1 3 2])),3))

This assumes that pPoint and norms are size [nlay 3], while lStart and lEnd are size [nray 3]. The result is of size [nlay nray], each corresponding to a (layer,ray) pair.

This gives the correct result for your example:

dists =

      -Inf      -Inf
       Inf       Inf
      -Inf      -Inf
       Inf       Inf
    0.2500    0.2500
    0.7500    0.7500

Here's another way to introduce some fast matrix-multiplication into play for the denominator part calculations -

p1 = sum(bsxfun(@times,bsxfun(@minus,pPoint,permute(lStart,[3 2 1])),norms),2)
p2 = norms*(lEnd - lStart).'
dists = squeeze(p1)./p2

Since lStart is listed as a heavy dataset, it might be better to keep it as it is and permute things around it. Thus, one alternative approach to get squeeze(p1) would be with -

squeeze(sum(bsxfun(@times,bsxfun(@minus,permute(pPoint,[3 2 1]),lStart),permute(norms,[3 2 1])),2)).'

edited May 23 '17 at 12:10

Community

1
1

answered Nov 09 '15 at 13:38

Andras Deak -- Слава Україні

33,737
11
83
111

wow, quick work! This worked fine for me, once my run with real data completes in a few hours, I'll trial again with this. If it works I'll dance a small jig, then accept your answer and be eternally grateful. – Adam Sroka Nov 09 '15 at 13:45
@AdamSroka unfortunately quick is useless if it's wrong. Luis Mendo will probably come up with a proper solution while I try to debug mine:) – Andras Deak -- Слава Україні Nov 09 '15 at 13:46
Sorry, I think I gave you the wrong inputs in the MWE – Adam Sroka Nov 09 '15 at 13:52
@AdamSroka unfortunately I still don't get what you expect. But in your example, none of the vectors seem to be perpendicular to any of the normal vectors. This means that I don't get any zeros in the denominator, so I don't get infinities. But `dot(lEnd(1,:)-lStart(1,:),norms(1,:))` isn't zero either. Are you sure your new MCVE is correct? – Andras Deak -- Слава Україні Nov 09 '15 at 13:58
No, getting over excited and being a complete doughnut I've mixed up test cases - really sorry for wasting your time! – Adam Sroka Nov 09 '15 at 14:02
1

@AdamSroka no problem at all, I'm glad that it was working all along:) Which is especially good, as I couldn't find my error for the life of me:D – Andras Deak -- Слава Україні Nov 09 '15 at 14:03
1

@AdamSroka, also, if time is crucial, you should try the original implementation as well, where the ray index was first in the operations (just switch `1 3` to `3 1` and vice versa in every `bsxfun` call). Due to matlab's column-major memory access, it's possible that there's a reasonable performance difference (since one of the indices is a few hundreds, the other millions). Then of course the output matrix is also transposed, which can be reversed by a single transpose if necessary. – Andras Deak -- Слава Україні Nov 09 '15 at 14:06
I'm still baffled at the speed with which you answered, think I need to study and practice permute a lot more. – Adam Sroka Nov 09 '15 at 14:06
@AdamSroka thanks, but I'm still just learning. Masters like Luis Mendo and Divakar perform incredible things with `bsxfun` and `permute` and `reshape`, and in fact it's probable that they could give a more efficient solution than mine. – Andras Deak -- Слава Україні Nov 09 '15 at 14:07
So with this vectorised, the total run time of the entire program (considering 300 planes and ~14,000,000 rays) went from just over 8 hours down to 29 minutes (voxel lookup is now my major bottleneck). I'll run it again with the new denominator code and see how it gets on. Thanks again! – Adam Sroka Nov 11 '15 at 09:38
@AdamSroka Sorry to bother you guys again on this. Added a small edit for maybe better performance. – Divakar Nov 11 '15 at 11:57
@Divakar Thanks again! I'll give it a run and see how it gets along. My head is now officially fried from trying to learn to abuse permute properly. – Adam Sroka Nov 11 '15 at 12:12
@AdamSroka I have heard that quite often from people getting *exposed* to `permute` :D It takes some getting used to I guess. – Divakar Nov 11 '15 at 12:18
@Divakar I'm also thinking about that numerator... By separating the minus into 2 dot products, one would be easy and the other matrix-mul-able. – Andras Deak -- Слава Україні Nov 11 '15 at 15:53
@AndrasDeak Give it a try. But I was reluctant about it, because with matrix-mul we would only *cut* the dimension that has a length of `3`, so I thought maybe it's not worth it. – Divakar Nov 11 '15 at 16:26

Can this code be vectorised further to eliminate loop?

1 Answers1

Linked