2

Say i have two martrices:

A=50;
B=50;
C=1000;
X = rand(A,B);
Y = rand(A,B,C);

I want to subtract X from each slice C of Y. This is a fairly common problem and i have found three alternative solutions:

% Approach 1: for-loop
tic
Z1 = zeros(size(Y));
for i=1:C
    Z1(:,:,i) = Y(:,:,i) - X;
end
toc

% Approach 2: repmat
tic
Z2 = Y - repmat(X,[1 1 C]);
toc

% Approach 3: bsxfun
tic
Z3=bsxfun(@minus,Y,X);
toc

I'm building a program which frequently (i.e., many thousands of times) solves problems like this, so i'm looking for the most efficient solution. Here is a common pattern of results:

Elapsed time is 0.013527 seconds.
Elapsed time is 0.004080 seconds.
Elapsed time is 0.006310 seconds.

The loop is clearly slower, and bsxfun is a little slower than repmat. I find the same pattern when i element-wise multiply (rather than subtract) X against slices of Y, though repmat and bsxfun are a little closer in multiplication.

Increasing the size of the data...

A=500;
B=500;
C=1000;
Elapsed time is 2.049753 seconds.
Elapsed time is 0.570809 seconds.
Elapsed time is 1.016121 seconds.

Here, repmat is the clear winner. I'm wondering if anyone in the SO community has a cool trick up their sleeves to speed this operation up at all.

Shai
  • 111,146
  • 38
  • 238
  • 371
Nolan Conaway
  • 2,639
  • 1
  • 26
  • 42
  • 4
    Try increasing the size of your matrices. You'll see that `bsxfun` is better. It performs the replication of elements under the hood. In addition, this post may provide some insight: http://stackoverflow.com/questions/23879888/matlab-subtracting-matrix-subsets-by-specific-rows/ - This post performs operations similar to what you're looking at, and it has some timings for each attempt. `bsxfun` is definitely the winner here IMHO. – rayryeng Oct 22 '15 at 02:07
  • 1
    Some reliable benchmarking methods would use [`timeit`](http://www.mathworks.com/help/matlab/ref/timeit.html). You can also look to clear memory at the end of each approach. See [`here`](http://stackoverflow.com/a/29719681/3293881) for similar benchmarking methods and comparison between BSXFUN and REPMAT for built-ins including `@minus`. – Divakar Oct 22 '15 at 08:32
  • If your program is doing this operation many thousands of times it might be worth looking at the problem at a higher level than this. i.e. can you process your thousands of cases in one go? – Adrian Oct 23 '15 at 08:14

1 Answers1

2

Depending on your real case scenario, bsxfun and repmat will sometimes have some advantage over the other, just like @rayryeng suggested. There is one other option you can consider : mex. I hard coded some parameters for better performance here.

#include "mex.h"
#include "matrix.h"

void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{

    double *A, *B, *C;
    int ind_1, ind_2, ind_3, ind_21, ind_32, ind_321, Dims[3] = {500,500,5000};

    plhs[0] = mxCreateNumericArray(3, Dims, mxDOUBLE_CLASS, mxREAL);
    A = mxGetPr(prhs[0]);
    B = mxGetPr(prhs[1]);
    C = mxGetPr(plhs[0]);

    for ( int ind_3 = 0; ind_3 < 5000; ind_3++)
    {
        ind_32 = ind_3*250000;
        for ( int ind_2 = 0; ind_2 < 500; ind_2++)
        {
            ind_21 = ind_2*500;         // taken out of the innermost loop to save some calculation
            ind_321 = ind_32 + ind_21;
            for ( int ind_1 = 0 ; ind_1 < 500; ind_1++)
            {
                C[ind_1 + ind_321] = A[ind_1 + ind_321] - B[ind_1 + ind_21];
            }
        }
    }
} 

To use it, type this into command window ( assuming you name the above c file mexsubtract.c )

mex -WIN64 mexsubtract.c

Then you can use it like this:

Z4 = mexsubtract(Y,X);

Here are some test results on my computer using A=500, B=500, C=5000:

(repmat) Elapsed time is 3.441695 seconds.
(bsxfun) Elapsed time is 3.357830 seconds.
(cmex)   Elapsed time is 3.391378 seconds.

It's a close contender and in some more extreme case, it'll have an edge. For example, this is what I got with A = 10, B = 500, C = 200000 :

(repmat) Elapsed time is 2.769177 seconds.
(bsxfun) Elapsed time is 3.178385 seconds.
(cmex)   Elapsed time is 2.552115 seconds.
user3667217
  • 2,172
  • 2
  • 17
  • 29
  • Forgot about mex! i would have thought for sure that it would have a bigger edge on repmat and bsx.. – Nolan Conaway Oct 22 '15 at 21:25
  • I'd argue that MEX is certainly not required for an operation like this - especially since `bsxfun` / `repmat` already farm out their implementations to MEX wrappers anyway. However, it's interesting to see what the timings are like. Thanks for sharing! – rayryeng Oct 23 '15 at 07:29
  • I'd agree that this is not necessary for general purposes. But optimization would be a special case, right ? Honestly, I had no prior experience with optimization. This is an interesting little experiment for me. IMHO, if even just a little improvement will be meaningful and the development won't be dauntingly long, then it's time to open the hood, no ? – user3667217 Oct 23 '15 at 08:03