8

I have two 3-dimensional arrays, the first two dimensions of which represent matrices and the last one counts through a parameterspace, as a simple example take

A = repmat([1,2; 3,4], [1 1 4]);

(but assume A(:,:,j) is different for each j). How can one easily perform a per-j matrix multiplication of two such matrix-arrays A and B?

C = A; % pre-allocate, nan(size(A,1), size(B,2)) would be better but slower
for jj = 1:size(A, 3)
  C(:,:,jj) = A(:,:,jj) * B(:,:,jj);
end

certainly does the job, but if the third dimension is more like 1e3 elements this is very slow since it doesn't use MATLAB's vectorization. So, is there a faster way?

Tobias Kienzler
  • 25,759
  • 22
  • 127
  • 221
  • Have you actually timed the loop? For resent Matlab versions it might be quite fast. How much faster you expect the 'vectorized' version to bee? Thanks – eat Jul 05 '11 at 10:19
  • @eat: for 1000 parameters, it's a factor of 7 (MATLAB R2010a) and I'm using this inside an optimization loop, so it's important - I found a solution now, I'll post it after lunch – Tobias Kienzler Jul 05 '11 at 10:30
  • 1
    possible duplicate of [Multiply a 3D matrix with a 2D matrix](http://stackoverflow.com/questions/1745299/multiply-a-3d-matrix-with-a-2d-matrix) – gnovice Jul 05 '11 at 14:58
  • @TobiasKienzler: I assume you are pre-allocating the matrix `C`?? – Amro Jul 05 '11 at 16:45

5 Answers5

6

I did some timing tests now, the fastest way for 2x2xN turns out to be calculating the matrix elements:

C = A;
C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);

In the general case it turns out the for loop is actually the fastest (don't forget to pre-allocate C though!).

Should one already have the result as cell-array of matrices though, using cellfun is the fastest choice, it is also faster than looping over the cell elements:

C = cellfun(@mtimes, A, B, 'UniformOutput', false);

However, having to call num2cell first (Ac = num2cell(A, [1 2])) and cell2mat for the 3d-array case wastes too much time.


Here's some timing I did for a random set of 2 x 2 x 1e4:

 array-for: 0.057112
 arrayfun : 0.14206
 num2cell : 0.079468
 cell-for : 0.033173
 cellfun  : 0.025223
 cell2mat : 0.010213
 explicit : 0.0021338

Explicit refers to using direct calculation of the 2 x 2 matrix elements, see bellow. The result is similar for new random arrays, cellfun is the fastest if no num2cell is required before and there is no restriction to 2x2xN. For general 3d-arrays looping over the third dimension is indeed the fastest choice already. Here's the timing code:

n = 2;
m = 2;
l = 1e4;

A = rand(n,m,l);
B = rand(m,n,l);

% naive for-loop:
tic
%Cf = nan(n,n,l);
Cf = A;
for jl = 1:l
    Cf(:,:,jl) = A(:,:,jl) * B(:,:,jl);
end;
disp([' array-for: ' num2str(toc)]);

% using arrayfun:
tic
Ca = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:size(A,3), 'UniformOutput',false);
Ca = cat(3,Ca{:});
disp([' arrayfun : ' num2str(toc)]);

tic
Ac = num2cell(A, [1 2]);
Bc = num2cell(B, [1 2]);
disp([' num2cell : ' num2str(toc)]);

% cell for-loop:
tic
Cfc = Ac;
for jl = 1:l
    Cfc{jl} = Ac{jl} * Bc{jl};
end;
disp([' cell-for : ' num2str(toc)]);

% using cellfun:
tic
Cc = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
disp([' cellfun  : ' num2str(toc)]);

tic
Cc = cell2mat(Cc);
disp([' cell2mat : ' num2str(toc)]);

tic
Cm = A;
Cm(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
Cm(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
Cm(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
Cm(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
disp([' explicit : ' num2str(toc)]);

disp(' ');
Tobias Kienzler
  • 25,759
  • 22
  • 127
  • 221
  • 1
    Clever indeed. You may indeed need later to accept your own answer ;). Thanks – eat Jul 05 '11 at 11:42
  • 1
    Don't be fooled by CELLFUN, there is a hidden loop inside... So its really simpler to just write: `C = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:size(A,3), 'UniformOutput',false); C = cat(3,C{:});`. Both are not really better than the original for-loop! – Amro Jul 05 '11 at 16:43
  • @Amro: you're right, I did timing tests now. `arrayfun` was almost exactly as fast/slow as `num2cell + cellfun + cell2mat`, turns out the original for-loop is really the fastest (and yes, I pre-allocated `C`) unless you already have cells – Tobias Kienzler Jul 12 '11 at 11:43
  • 1
    @TobiasKienzler: I posted some benchmark tests of my own... As expected, FOR-loops are pretty fast, especially with the Just-in-Time (JIT) accelerator improvements in recent versions of MATLAB – Amro Jul 14 '11 at 00:47
4

Here is my benchmark test comparing the methods mentioned in @TobiasKienzler answer. I am using the TIMEIT function to get more accurate timings.

function [t,v] = matrixMultTest()
    n = 2; m = 2; p = 1e5;
    A = rand(n,m,p);
    B = rand(m,n,p);

    %# time functions
    t = zeros(5,1);
    t(1) = timeit( @() func1(A,B,n,m,p) );
    t(2) = timeit( @() func2(A,B,n,m,p) );
    t(3) = timeit( @() func3(A,B,n,m,p) );
    t(4) = timeit( @() func4(A,B,n,m,p) );
    t(5) = timeit( @() func5(A,B,n,m,p) );

    %# check the results
    v = cell(5,1);
    v{1} = func1(A,B,n,m,p);
    v{2} = func2(A,B,n,m,p);
    v{3} = func3(A,B,n,m,p);
    v{4} = func4(A,B,n,m,p);
    v{5} = func5(A,B,n,m,p);
    assert( isequal(v{:}) )
end

%# simple FOR-loop
function C = func1(A,B,n,m,p)
    C = zeros(n,n,p);
    for k=1:p
        C(:,:,k) = A(:,:,k) * B(:,:,k);
    end
end

%# ARRAYFUN
function C = func2(A,B,n,m,p)
    C = arrayfun(@(k) A(:,:,k)*B(:,:,k), 1:p, 'UniformOutput',false);
    C = cat(3, C{:});
end

%# NUM2CELL/FOR-loop/CELL2MAT
function C = func3(A,B,n,m,p)
    Ac = num2cell(A, [1 2]);
    Bc = num2cell(B, [1 2]);
    C = cell(1,1,p);
    for k=1:p
        C{k} = Ac{k} * Bc{k};
    end;
    C = cell2mat(C);
end

%# NUM2CELL/CELLFUN/CELL2MAT
function C = func4(A,B,n,m,p)
    Ac = num2cell(A, [1 2]);
    Bc = num2cell(B, [1 2]);
    C = cellfun(@mtimes, Ac, Bc, 'UniformOutput', false);
    C = cell2mat(C);
end

%# Loop Unrolling
function C = func5(A,B,n,m,p)
    C = zeros(n,n,p);
    C(1,1,:) = A(1,1,:).*B(1,1,:) + A(1,2,:).*B(2,1,:);
    C(1,2,:) = A(1,1,:).*B(1,2,:) + A(1,2,:).*B(2,2,:);
    C(2,1,:) = A(2,1,:).*B(1,1,:) + A(2,2,:).*B(2,1,:);
    C(2,2,:) = A(2,1,:).*B(1,2,:) + A(2,2,:).*B(2,2,:);
end

The results:

>> [t,v] = matrixMultTest();
>> t
t =
      0.63633      # FOR-loop
      1.5902       # ARRAYFUN
      1.1257       # NUM2CELL/FOR-loop/CELL2MAT
      1.0759       # NUM2CELL/CELLFUN/CELL2MAT
      0.05712      # Loop Unrolling

As I explained in the comments, a simple FOR-loop is the best solution (short of loop unwinding in the last case, which is only feasible for these small 2-by-2 matrices).

Community
  • 1
  • 1
Amro
  • 123,847
  • 25
  • 243
  • 454
  • I'm afraid you just got the checkmark stolen by [Ali's answer](http://stackoverflow.com/a/32468362/321973) introducing the MMX toolbox, which didn't exist before 2012... – Tobias Kienzler Sep 09 '15 at 05:06
  • @TobiasKienzler ah that's ok. After all, it's hard to beat C code! I looked at the source code of the MMX toolbox, and it is basically creating threads (as many as there are processors) each calling a matrix multiplication function over the matrix slice it was assigned. In case you enabled optimization when compiling, it will use the `dgemm` BLAS routine (from Intel MKL library which ships with MATLAB) in order to perform the matrix-multiplication, this is the same routine MATLAB uses internally. – Amro Sep 09 '15 at 08:54
  • 1
    ... That said, for small 2x2 matrices, you should watch out for oversubscription (MKL that ships with MATLAB is itself multithreaded, at the same time MMX toolbox is calling it from multiple threads). You might actually get even better performance by using a library optimized for small mat-mult (BLAS really shines for large matrices). You can see this fact in Ali's timing; MMX took almost the same time as the loop unrolled version. Now imagine the same code implemented in C! IMO the problem is memory-bound not CPU-bound, and threads are less effective here, it's all about good cache reuse. – Amro Sep 09 '15 at 08:55
3

I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.

The advantages of MMX are:

  1. It is easy to use.
  2. Multiply n-dimensional matrices (actually it can multiply arrays of 2-D matrices)
  3. It performs other matrix operations (transpose, Quadratic Multiply, Chol decomposition and more)
  4. It uses C compiler and multi-thread computation for speed up.

For this problem, you just need to write this command:

C=mmx('mul',A,B);

I added the following function to @Amro's answer

%# mmx toolbox
function C=func6(A,B,n,m,p)
    C=mmx('mul',A,B);
end

I got this result for n=2,m=2,p=1e5:

    1.6571 # FOR-loop
    4.3110 # ARRAYFUN
    3.3731 # NUM2CELL/FOR-loop/CELL2MAT
    2.9820 # NUM2CELL/CELLFUN/CELL2MAT
    0.0244 # Loop Unrolling
    0.0221 # MMX toolbox  <===================

I used @Amro's code to run the benchmark.

rayryeng
  • 102,964
  • 22
  • 184
  • 193
Ali Mirzaei
  • 1,496
  • 2
  • 16
  • 27
1

An even faster method, in my experience, is to use dot multiplication and summation over the three-dimensional matrix. The following function, function z_matmultiply(A,B) multiplies two three dimensional matrices which have the same depth. Dot multiplication is done in a manner that is as parallel as possible, thus you might want to check the speed of this function and compare it to others over a large number of repetitions.

function C = z_matmultiply(A,B)

[ma,na,oa] = size(A);
[mb,nb,ob] = size(B);

%preallocate the output as we will do a loop soon
C = zeros(ma,nb,oa);

%error message if the dimensions are not appropriate
if na ~= mb || oa ~= ob
    fprintf('\n z_matmultiply warning: Matrix Dimmensions Inconsistent \n')
else

% if statement minimizes for loops by looping the smallest matrix dimension 
if ma > nb
    for j = 1:nb
        Bp(j,:,:) = B(:,j,:);
        C(:,j,:) = sum(A.*repmat(Bp(j,:,:),[ma,1]),2);
    end
else
    for i = 1:ma
        Ap(:,i,:) = A(i,:,:);
        C(i,:,:) = sum(repmat(Ap(:,i,:),[1,nb]).*B,1);
    end 
end

end
Ruben
  • 11
  • 1
  • 1
    you can use `bsxfun` instead of `repmat`. – Shai Oct 22 '13 at 15:09
  • 2
    It is best [not to use `i` and `j` as variable names in Matlab](http://stackoverflow.com/questions/14790740/using-i-and-j-as-variables-in-matlab). – Shai Oct 22 '13 at 15:10
1

One technique would be to create a 2Nx2N sparse matrix and embed on the diagonal the 2x2 matrices, for both A and B. Do the product with sparse matrices and take the result with slightly clever indexing and reshape it to 2x2xN.

But I doubt this will be faster than simple looping.

eat
  • 7,440
  • 1
  • 19
  • 27