Matrix multiplication is time-consuming when I write the same code in c mex format

Question

Here is the code in Matlab:

A=B*C;

B is a matrix with a size of 512 to 1024. So, the size of C is [1024*1024].

At first, I vectorized matrixes because for my further implementation I need them in vector form.

Then I run the code in Matlab and C environment and I compared the computational time... I expected that the C code must be faster, but the result was different...

    B = rand(512,1024);
    C = rand(1024,1024); 

  tic  

    A = B*C;

  toc

    B = B(:);
    C = C(:);

    B_vec(1,:) = B;
    C_vec(1,:) = C;

   tic
    mult_test = mult_calc(B_vec,C_vec);        
   toc

Here is the C MEX file:

#include "mex.h"
#include <stdint.h>
#include <time.h>
#include <math.h>

/* The computational routine */


void mult_calc(double *B,double *C, int m,double *mult_test)
{

    int i;


    /* multiply each element y by x */        
for (int i = 0; i < m; i++) {

            int j = i / 512;
            int z = i % 512;

             mult_test[i] =0;

             for (int k = 0; k < 1024; k++) {

                 mult_test[i] += B[z+(512*k)]*C[(j*1024)+k];
             }
        }  



}




/* The gateway function */
void mexFunction( int nlhs, mxArray *plhs[],
                  int nrhs, const mxArray *prhs[])
{

    double *B;               /* 1xN input matrix */
    double *C;               /* 1xN input matrix */

    double *mult_test;               /* 1xN input matrix */



    int m;                   /* size of matrix */





    /* create a pointer to the real data in the input matrix  */
    B = mxGetPr(prhs[0]);
    C = mxGetPr(prhs[1]) ;


    /* get dimensions of the input matrix */
    m = mxGetN(prhs[0]);

    /* create the output matrix */
    plhs[0] = mxCreateDoubleMatrix(1,m,mxREAL);

    /* get a pointer to the real data in the output matrix */

    mult_test = mxGetPr(plhs[0]);

    /* call the computational routine */
    mult_calc(B,C,m,mult_test);
}

Hence, the results are the same but the computational time is different as I bring it below:

Matlab Elapsed time is 0.016627 seconds.

C MEX Elapsed time is 1.672939 seconds.

I do not know where is the problem in my case... Does it have any specific reason?

I will be glad if anybody helps me in this regard...

Best

Matrix multiplication is well known for being a problem for cache. When you write a simple loop to multiply matrices, it traverses one of the matrices by columns. Cache designs typically load 32 or more bytes at a time. When you traverse a column of a matrix and load one eight-byte element, the processor actually loads 32 or more bytes. So your code ends up loading the entire matrix four times. Matlab likely contains code designed for this, and it processes four or more columns at a time instead of one. For further information, looking for “matrix tiling”. — Eric Postpischil, Feb 24 '20 at 12:44
Basically, implementing high-performance matrix multiplication is a complicated task, and you may be better off using an existing library for matrix operations (such as BLAS or equivalent) than writing your own. — Eric Postpischil, Feb 24 '20 at 12:47
measure also the time taken by the extra mex functions like mxCreateDoubleMatrix ! — B. Go, Feb 24 '20 at 13:14
Its unlikely that you will ever beat MATLAB on this, and if you do, give them a call, they will hire you. — Ander Biguri, Feb 24 '20 at 13:30
Depending on the processor, _signed_ division and other math may take longer than _unsigned_. Consider `int i` --> `unsigned i` (and others) for a potential modest improvement. — chux - Reinstate Monica, Feb 24 '20 at 14:34

Matrix multiplication is time-consuming when I write the same code in c mex format

0 Answers0