matlab code optimization - multiply all 3x3 matrices in a large cell array with each other

Question

Summary

Looking to increase time efficiency of my code, repeatedly performing matrix multiplication between a 3x3 matrix and an inverse 3x3 matrix - using mldivide.

Background

I am trying to implement a vector quantization method before using orientation data for hand-eye calibration between sensors attached to a subject's lower limb in gait analysis... The algorithm I'm following is from the paper "Data Selection for Hand-eye Calibration: A Vector Quantization Approach" That's background that you probably don't need...

Code to Optimize

I was hoping to find a faster method of solving all possible "relative movements" (A or B), which takes too long (C and D are around 2000 elements long, therefore the size of A or B will get up to =2000*(2000-1)/2=1999000):

%C,D is cell array, each cell is 3x3 rotation matrix.
Nframe=length(C);
Nrel = Nframe*(Nframe-1)/2;
A=cell(Nrel,1);
B=cell(Nrel,1);
At=zeros(Nrel,4);
Bt=zeros(Nrel,4);
count = 1;
for i=1:Nframe
    for j=1:Nframe
        if j <= i
            continue;
        else
            % Main place to optimize (avoid looping?)!!
            % In DCM representation (ie. each cell is 3x3 matrix)
            A{count,1} = C{j}\C{i}; %=inv(C{i+x})*C{i}
            B{count,1} = D{j}\D{i};

            %Using the following adds to ~ 4000% to the time (according to tic toc).
            %Represent as axis-angle (ie. each cell -> 1x4 vec) - perhaps compute later
            At(count,:) = SpinConv('DCMtoEV',A{count}); %on Matlab File Exchange
            Bt(count,:) = SpinConv('DCMtoEV',B{count});
            count=count+1;
        end
    end
end

Hope this is the right place to ask, and I was unable to find a previous solution I could apply. Also, I have no real experience, so I'm unsure if the computational time is just unavoidable when working with large matrices.

--------------------------------------------------------------------------------------------

*EDIT*

Matrix Properties: Rotational, so as commented below - they are 'nice', not singular. They are in Special Orthogonal Group, SO(3) [transpose=inverse]. See http://en.wikipedia.org/wiki/Rotation_matrix#Properties_of_a_rotation_matrix

Method to test: To create random rotation matrices, R, use the following code:

[U,S,V] = svd(randn(3,3)); 
R = U∗V';
if det(R) < 0 
    S(1 ,1) = 1; 
    S(2 ,2) = 1;
    S(3,3) = −1; 
    R = U∗S∗V’;
end

SpinConv: I am just using it to convert from 3x3 directional cosine matrix to axis-angle representation. It is more involved, and converts more than necessary for stability (to quaternions first). Here's the link: http://www.mathworks.com/matlabcentral/fileexchange/41562-spinconv/content/SpinConv.m Here's all that needs to be done (not in SpinConv - just implemented the method quickly):

t = (trace(R)-1)/2;
% this is only in case of numerical errors
if t < -1,
    t = -1;
elseif t>1,
    t = 1;
end
theta = acosd(t);
if isequalf(180,theta) || isequalf(0,theta),
    axis = null(R-eye(3));
    axis = axis(:,1)/norm(axis(:,1));
else
    axis = [R(3,2) - R(2,3); R(1,3) - R(3,1); R(2,1) - R(1,2) ];
    axis = axis/(2*sind(theta));
end
At(count,:) = [-axis,theta]; %% NOTE (-ve) as noted in comments of correct answer.

*EDIT #2* Just realized, alternatively, I can use quaternion to avoid using 3x3 matrices:

So quaternion is a 1x4 vector. Original code can be changed to (inside else statement):

A(count,:) = qnorm(qmult(qconj(C(j,:)),C(i,:)));
vec = [q(1) q(2) q(3)]/norm([q(1) q(2) q(3)]);
theta = 2*atan2(norm([q(1) q(2) q(3)]),q(4));
At(count,:)=[vec,theta];

where qconj, qmult, and qnorm are quaternion operations.

Alright, so sorry - that's all the info and possibilities I have.

You haven't shown the code that's slow. You haven't even linked to it. Suggesting that everyone who reads this question go and browse through MATLAB File Exchange until they find the particular SpinConv function you're using is a non-starter. — Ben Voigt, Jul 15 '13 at 20:33
Couple of observations: 1) you don't preallocate `A` and `B`, which can slow down the loop. 2) you call `SpinConv`, an external function; the loops can therefore not be JIT'ed for optimal performance. If the relevant parts of `SpinConv` are small enough, just copy them in-place to speed up the loop. 3) You don't say anything about your 3x3 matrices; are they symmetric? Sparse? Diagonal? Such properties determine the best method to speed up your calculations. — Rody Oldenhuis, Jul 16 '13 at 06:13
@BenVoight: Sorry for the lack of details: SpinConv is not what I'm concerned about right now, since the inv(C)*C is what I know you guys could help with - sorry that wasn't clear - I thought I'd worry about the SpinConv later. — Jon, Jul 16 '13 at 12:51
@RodyOldenhuis: The matrices are rotational matrices, so symmetric, with nonzero values in each element, homogeneous, orthogonal (inv and transpose are same), det(R)=+1 etc... I can remove SpinConv to another loop and "JIT" (just learned that) the inv(C)*C part?? — Jon, Jul 16 '13 at 12:52
Well, if inverse and transpose are the same, get rid of that `mldivide` and use multiplication instead! I thought from your comment that `SpinConv` was the slow part, but indeed it probably is just interfering with loop optimizations, so definitely move it to another loop. — Ben Voigt, Jul 16 '13 at 14:14

Rody Oldenhuis · Accepted Answer · 2013-07-16T15:16:57.377

As I commented above, the fastest method strongly depends on the properties of your matrices. For example, some algorithm can strongly benefit from the matrix being symmetric, but is rather slow if it is not.

So without further information, I can only make some general statements, and compare some methods on random matrices (which usually does not give a good comparison in the context of matrices inverses).

Depending on your MATLAB version (the JIT in R2011a or so greatly improved it), pre-allocating A and B gives you a large gain in loop performance; growing arrays dynamically inside a loop is usually very inefficient.

Along the same lines is the call to SpinConv: as this is an external function (MEX or m, doesn't matter), JIT cannot compile this loop, so you're limited by the speed of the interpreter. Which is rather low. If at all possible, you can avoid that by simply copy-pasting the relevant parts of SpinConv into the loop body. I know, it's hugely annoying (and I sure hope this is automated in future versions of MATLAB), but for now it is the only way to get JIT understand the loop structure and compile it (really, factors of 100 or more are not uncommon).

So, having said all that, I tested 2 different methods:

compute and store the LU decompositions of C and D, and re-use them in the loop
Solve the systems Cn \ [C{1} C{2} ... C{n-1}] for all n = 2:N, and re-order

In code:

clc
clear all

Nframe = 500;

%// Some random data
C = cellfun(@(~)rand(3), cell(Nframe,1), 'UniformOutput', false); 
D = cellfun(@(~)rand(3), cell(Nframe,1), 'UniformOutput', false);


%// Your original method 
tic

count  = 1;
for i=1:Nframe
    for j=1:Nframe
        if j <= i
            continue;
        else
            %// Main place to optimize (avoid looping?)!!
            %// In DCM representation (ie. each cell is 3x3 matrix)
            A{count,1} = C{j}\C{i}; %=inv(C{i+x})*C{i}
            B{count,1} = D{j}\D{i};

            count=count+1;
        end
    end
end

toc 
A1 = A;



%// First method: compute all LU decompositions and re-use them in the loop
%// ------------------------------------------------------------------------

tic

%// Compute LU decompositions of all C and D
Clu = cell(Nframe, 2);
Dlu = cell(Nframe, 2);
for ii = 1:Nframe
    [Clu{ii,1:2}]  = lu(C{ii});
    [Dlu{ii,1:2}]  = lu(D{ii});
end

%// improvement: pre-allocate A and B
A = cell(Nframe*(Nframe-1)/2, 1);
B = cell(Nframe*(Nframe-1)/2, 1);

%// improvement: don't use i and j as variable names
count  = 1;
for ii = 1:Nframe

    %// improvement: instead of continue if j<=i, just use different range
    for jj = ii+1 : Nframe

        %// mldivide for LU is equal to backwards substitution, which is
        %// trivial and thus fast
        A{count} = Clu{jj,2}\(Clu{jj,1}\C{ii});
        B{count} = Dlu{jj,2}\(Dlu{jj,1}\D{ii});

        count = count+1;

    end
end

toc
A2 = A;



%// Second method: solve all systems simultaneously by concatenation
%// ------------------------------------------------------------------------

tic

% Pre-allocate temporary matrices
Aa = cell(Nframe-1, 1);
Bb = cell(Nframe-1, 1);

for ii = 2:Nframe   
    % Solve Cn \ [C1 C2 C3 ... Cn]
    Aa{ii-1} = C{ii}\[C{1:ii-1}];
    Bb{ii-1} = D{ii}\[D{1:ii-1}];
end
toc

%// Compared to the order in which data is stored in one of the other
%// methods, the order of data in Aa and Bb is different. So, we have to
%// re-order to get the proper order back: 

tic

A = cell(Nframe*(Nframe-1)/2, 1);
B = cell(Nframe*(Nframe-1)/2, 1);
for ii = 1:Nframe-1

     A( (1:Nframe-ii) + (Nframe-1-(ii-2)/2)*(ii-1) ) = ...
         cellfun(@(x) x(:, (1:3) + 3*(ii-1)), Aa(ii:end), 'UniformOutput', false);

     B( (1:Nframe-ii) + (Nframe-1-(ii-2)/2)*(ii-1) ) = ...
         cellfun(@(x) x(:, (1:3) + 3*(ii-1)), Bb(ii:end), 'UniformOutput', false);
end

toc
A3 = A;

% Check validity of outputs
allEqual = all( cellfun(@(x,y,z)isequal(x,y)&&isequal(x,z), A1,A2,A3) )

Results:

Elapsed time is 44.867630 seconds.  %// your original method 
Elapsed time is 1.267333 seconds.   %// with LU decomposition
Elapsed time is 0.183950 seconds.   %// solving en-masse by concatenation
Elapsed time is 1.871149 seconds.   %// re-ordering the output of that

allEqual = 
    1

Note that I'm on R2010a, so that the slowness of your original method is primarily due to A and B not being pre-allocated. Note that performance on newer MATLAB versions will be better in this regard, but still, performance will be better if you pre-allocate.

Intuitively (and as others will probably suggest), you could compute the explicit inverses,

Cinv = cellfun(@inv, C, 'UniformOutput', false);

or even

Cinv = cellfun(@(x) [...
    x(5)*x(9)-x(8)*x(6)  x(7)*x(6)-x(4)*x(9)  x(4)*x(8)-x(7)*x(5) 
    x(8)*x(3)-x(2)*x(9)  x(1)*x(9)-x(7)*x(3)  x(7)*x(2)-x(1)*x(8)
    x(2)*x(6)-x(5)*x(3)  x(4)*x(3)-x(1)*x(6)  x(1)*x(5)-x(4)*x(2)] / ...
        (x(1)*x(5)*x(9) + x(4)*x(8)*x(3) + x(7)*x(2)*x(6) - ...
         x(7)*x(5)*x(3) - x(4)*x(2)*x(9) - x(1)*x(8)*x(6)),...
    C, 'UniformOutput', false);

(which will be faster and more accurate), and then simply multiply inside the loop. As you will see, this is significantly slower than both the en-masse solve Cn\[C1 C2 ... Cn-1] and LU (although that depends on the nature of the matrices). Also, it fails produce allEqual == true; sometimes the differences are small, but often (especially for near-singular matrices and other specials), the differences are huge.

As mentioned in a lot of other questions here on SO, and as any refined Google search or advanced Linear Algebra book will tell you, using explicit inverses in numerical applications will usually be slow, always be inaccurate, and sometimes even dangerous. The inverse is a very nice theoretical construct, but pretty much useless in any practical application of that theory. Therefore, it is better you use one of the other methods mentioned above.

In conclusion:

If you can live with the data out of order (which probably necessitates more complex indexing later on), solving the systems en-masse by concatenation is fastest by far. Granted, the way I re-order the data can be improved upon, but I suspect LU will still always be faster if you need to re-order.
If this is not the case, but your matrices are suited for LU decomposition, use that. To find out if this is the case, just use it on your real data and profile. You can also try the additional outputs of LU (most notably, the permutation matrix P, or for sparse matrices, the column reordering matrix Q).
Of course, if QR decomposition is more appropriate, use qr. Same for chol, or pcg, etc. Experiment with different methods a bit.

EDIT:

As you mention, all the matrices are SO(3) rotation matrices. Wow, is that ever important information!! In that case, the inverse is just the transpose, which is one or two orders of magnitude faster than any variant of the inverse. Also, you indicate that you want to convert those rotation matrices to axis-angle representation. The kernel should then be changed to something along the lines of

A = C{ii}.'*C{jj};
B = D{ii}.'*D{jj};

[V,lam] = eig(A);
axis  = V(:,imag(diag(lam))==0);
theta = acos(min(max(-1, (trace(A)-1)/2), 1));
At{count, :} = {axis theta*180/pi};

[V,lam] = eig(B);
axis  = V(:,imag(diag(lam))==0);
theta = acos(min(max(-1, (trace(B)-1)/2), 1));
Bt{count, :} = {axis theta*180/pi};

This uses only built-in functions, so this should be pretty efficient. At least it's better than copy-pasting SpinConv, since SpinConv uses a lot of non-builtin functions (null, isequalf, acosd, sind). Note that the method above uses the eigenvalue method; you can make it slightly more efficient if you use the determinant method used in the SpinConv function, provided you update it to have no calls to non-built-ins.

BEWARE: it appears that that version of SpinConv has an incorrect sign of the axis; the sign of the axis computed in the elseif is opposite that of the axis computed in the if.

Note: the code I added above is not from SpinConv, which is too complex to put here (see the link for details). The code I posted was from a another function converting rotation matrices to axis-angle... So if anyone finds this, you can still trust SpinConv (which doesn't use those non-built-ins). — Jon, Jul 16 '13 at 15:31
... It still takes over 10 minutes to compute. Guess that's life. — Jon, Jul 16 '13 at 16:08
@Jon: I can take a look at your complete code, to identify potential speedups...If you're interested, send all relevant m-files to [my last name] AT gmail DOT com — Rody Oldenhuis, Jul 17 '13 at 07:15

score 1 · Answer 2 · answered Jul 15 '13 at 20:31

1

I would try computing and storing inv(C{j}) since C{j} appears in a matrix divide multiple times. Ditto D{j}. Or are your 3x3 matrices singular?

answered Jul 15 '13 at 20:31

Ben Voigt

277,958
43
419
720

If the matrices are 'nice' enough it may be fine to calculate their inverse, but in general the use of `inv` is not recommended if it can be avoided. http://stackoverflow.com/questions/1419580/why-is-matlabs-inv-slow-and-inaccurate – Dennis Jaheruddin Jul 16 '13 at 07:59
@Dennis: You're right, LU decomposition is better. I was just noting that this effort is repeated many many times and so should be precomputed. – Ben Voigt Jul 16 '13 at 14:08

matlab code optimization - multiply all 3x3 matrices in a large cell array with each other

--------------------------------------------------------------------------------------------

2 Answers2