3

I have been using MATLAB for a while for my projects and I have almost never had an experience in C++.

I needed speed and I heard that C++ can be more efficient and faster than MATLAB. So I tried this:

I created a matrix of random numbers using rand(5000,5000) on MATLAB.

And in C++, I have initialized a 2D vector created 2 for loops each of them looping for 5000 times and each time. MATLAB was 4-5x faster, so I thought it is because matlab executes vectorized codes in parallel, then I written the C++ code using parallel_for. Here is the code:

#include "stdafx.h"
#include <iostream>
#include <vector>
#include <fstream>
#include <ppl.h>
using namespace std;
using namespace concurrency;
int main();
{
    int a = 5000, b = 5000, j, k;
    vector< vector<int> > vec(a, vector<imt>(b));
    parallel_for(int(0), a, [&](int i) {
        for (j = 0; j <b; j++)
        {
            vec[i][j] = rand();
        }
    });
}

So the code above is about 25% faster than MATLAB's rand(5000,5000) Yet C++ is using 100% of the CPU while MATLAB is using 30% of CPU.

So I forced MATLAB to use all of the CPU by running 3 instances of MATLAB using rand(5000,5000) and divided the time it takes by 3. It made MATLAB twice as fast as C++.

I wonder what am I missing? I know this is a tiny example but I need an answer to be sure to port my code to C++.

Current status:

When I write C++ code without parallel_for I get half of the MATLAB's speed with the same CPU usage. Yet people who gave answers say that they are almost the same. I do not understand what I am missing

here is a snapshot of the optimization menu enter image description here

  • 2
    May not be related, just curious: did you try using a 1D vector of size 25000 instead, and then pretend it to be 2D during the execution? –  Jul 29 '15 at 06:46
  • Don't use `vector` here, just create a 2D array of ints. The `[]` operator for vector is not equivalent to the classic `[]` operator, it takes more time to do the job (e.g. check size). – Bentoy13 Jul 29 '15 at 06:47
  • 1
    @Bentoy13 vector `operator[]` does not check size, you are thinking of `at()`. – Chris Drew Jul 29 '15 at 06:52
  • 6
    *I need an answer to be sure to port my code to C++* Stop ! Don't. Most of Matlab's core computational routines are already written in C++ (or another compiled language) and will out-perform anything you are likely to write yourself. Many of them are already multi-threaded, if you want to write faster C++ you'll have to do that too. There are many questions and answers here on SO relating to the (generally) useless pursuit of writing faster code than Matlab. – High Performance Mark Jul 29 '15 at 06:54
  • @ChrisDrew Oh, thanks, I was inverting these ones. Sorry. – Bentoy13 Jul 29 '15 at 07:05
  • @HighPerformanceMark But the guy in the post below says that he can achieve 10x to 100x improvement when porting from MATLAB to C++ http://stackoverflow.com/questions/20513071/performance-tradeoff-when-is-matlab-better-slower-than-c-c – Serdar Oztetik Jul 29 '15 at 07:06
  • @Bentoy13 I did what you said, and the speed was the same with the vector one. And I had to do a workaround because it gives me stackoverflow when I 5000x5000 array of ints instead of vector. – Serdar Oztetik Jul 29 '15 at 07:09
  • 4
    He says *I have been using Matlab and C++ for about 10 years* while you admit *I have almost never had an experience in C++.* That's part of my argument -- it might take you (or me) 10 years with C++ to be able to write faster code for the core computational routines that Matlab provides. But it's up to you. – High Performance Mark Jul 29 '15 at 07:10
  • 1
    @user2096605 I never say create a _static_ array, `new[]` is way more suitable here. Believe HighPerformanceMark: you shouldn't try to optimize primitive functions of Matlab. Some Matlab code can be faster if rewritten in C++, especially if your write some poorly Matlab code. Start optimizing your matlab code, you can gain speed using their primitive. And if you want best speed, Matlab has a compiler + parallel toolbox (not cheap). – Bentoy13 Jul 29 '15 at 07:17
  • @HighPerformanceMark I understand now. But I still think it is beneficial for me to try and see how fast I can get. Also I believe what I ask here is one of the extreme examples because I know such a core Matlab function `rand` should be as efficient as it can be. So learning the optimizations should help me for the rest of the code that are not as efficient in Matlab. – Serdar Oztetik Jul 29 '15 at 07:19
  • 2
    You might be interested in the [Armadillo](http://arma.sourceforge.net/) library. – Chris Drew Jul 29 '15 at 07:29
  • @Bentoy13 I will do it using arrays. – Serdar Oztetik Jul 29 '15 at 07:33
  • @ChrisDrew I am looking into it – Serdar Oztetik Jul 29 '15 at 07:34
  • How does c++ handle the seeds in a parallel for? There is a good chance that the code above generates repeated numbers because all threads fork with the same seed set. – Daniel Jul 29 '15 at 07:36
  • Out of interest, what were the timing results and how did you measure? – Chris Drew Jul 29 '15 at 07:53
  • @ChrisDrew I did measure MATLAB using tic toc just like in the first answer I did 100 loops. it took me 50 seconds. In the case of C++ I didn't know the way to do it in like the answer so again I did a 100 loop. I built the code as an executable and measured it manually (using a chronometer) it again took around 45-50 seconds. only thing different was C++ was using all the CPU ( I was doing in parallel) yet matlab was using 30ish % (which is the 100% of 1 thread) – Serdar Oztetik Jul 29 '15 at 08:22
  • I did some years ago an Inpaint algorithm in Matlab, after porting it to C++ the gain factor was around 100x... so yes... maybe you have to consider some things. First of all, matlab uses a memory pool, so it is not fair to use "new" here, do it on the stack. Are you compiling with optimizations on?, are you enabling vectorization? btw rand() implementation on Matlab is WAY better than the rand() implementation on you compiler. Another point you are not considering is cache friendly algorithms or prefetching positions to avoid a cache miss.. – Jose Palma Jul 29 '15 at 08:28
  • 3
    (1) rand() sucks. (2) I'd assume the C `rand()` takes a lock on the global RNG state, which kills your parallelism. Give every thread it's own random_device. see also http://stackoverflow.com/questions/7217791/random-numbers-in-c0x – peterchen Jul 29 '15 at 09:22
  • 1
    The two random number generators use completely different algorithms with different outputs and statistics. Comparing these is pointless. But you should still be able to easily write a Mersenne twister code in C/C++ that outperforms Matlab's `rand` by a factor of two or three times (I've done it myself). – horchler Aug 02 '15 at 16:08

4 Answers4

2

This is maybe no answer, but a litle hint. The comparison might be a bit unfair due to the usage of vectors.

Here is a comparison I've written. Both take up roughly 100% of one of the four threads available. In both cases I create 5000x5000 random numbers and do this 100 times for timing

Matlab

function stackoverflow

tic
for i=1:100
    A =rand(5000);
end
toc

Runtime: ~27.9 sec

C++

#include <iostream>
#include <stdlib.h>
#include <time.h>
#include <ctime>

using namespace std;


int main(){

    int N = 5000;
    double ** A = new double*[N];
    for (int i=0;i<N;i++)
        A[i] = new double[N];


    srand(time(NULL));

    clock_t start = clock();
    for (int k=0;k<100;k++){
        for (int i=0;i<N;i++){
            for (int j=0;j<N;j++){
                A[i][j] = rand();
            }
        }
    }

    cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl;

}

Runtime: ~28.7 sec

So both examples run almost equally fast.

Thomas
  • 1,199
  • 1
  • 14
  • 29
  • I just used the same comments with no change. Matlab takes 38 seconds yet C++ takes 80ish to complete. What am I missing? I am using visual C++ as my IDE. – Serdar Oztetik Jul 29 '15 at 09:09
  • I simply use `g++ filename.c` on a linux machine. – Thomas Jul 29 '15 at 10:10
  • @ChrisDrew yes I am using in release mode and running manually the exe file – Serdar Oztetik Jul 29 '15 at 11:00
  • 2
    The two random number generators are completely different algorithms with different outputs and statistics. Matlab's Mersenne twister-based `rand` produces variates on (0,1). C++'s linear-congruential generator-based `rand` produces variates on [0,`RAND_MAX`]. Comparing these is pointless. – horchler Aug 02 '15 at 16:01
2

When you call rand(5000,5000) in Matlab, Matlab executes the command by calling Intel MKL library, which is a highly optimized library written in C/C++ with lots of hand-coded assembly.

MKL should be faster than any straightforward C++ implementation, but there is an overhead for Matlab to call external library. The net result is that, for random number generation in smaller sizes (less than 1K for instance), plain C/C++ implementation will be faster, but for larger sizes, Matlab will benefit from super optimized MKL.

PhD AP EcE
  • 3,751
  • 2
  • 17
  • 15
1

After looking at @sonystarmap's answer, I added a few types of containers: double*, vector<double> and vector<vector<double> >. I also added tests where the "pointer-containers" are memset, since vector initialises all memory.

The C++ code was compiled with these optimization flag: -O3 -march=native

The results:

Matlab: Elapsed time is 28.457788 seconds.

C++:

T=23844.2ms

T=25161.5ms

T=25154ms

T=24197.3ms

T=24235.2ms

T=24166.1ms

I can essentially not find the large gain you mention.

#include <iostream>
#include <stdlib.h>
#include <time.h>
#include <ctime>
#include <vector>
#include <cstring>

using namespace std;


int main(){

    const int N = 5000;

    {
        vector<double> A(N*N);

        srand(0);

        clock_t start = clock();
        for (int k=0;k<100;k++){
            for (int i=0;i<N;i++){
                for (int j=0;j<N;j++){
                    A[i*N+j] = rand();
                }
            }
        }

        cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl;
    }

    {
        vector<vector<double> > A(N);
        for (int i=0;i<N;i++)
            A[i] = vector<double>(N);

        srand(0);

        clock_t start = clock();
        for (int k=0;k<100;k++){
            for (int i=0;i<N;i++){
                for (int j=0;j<N;j++){
                    A[i][j] = rand();
                }
            }
        }

        cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl;
    }

    {
        double ** A = new double*[N];
        for (int i=0;i<N;i++)
            A[i] = new double[N];

        srand(0);

        clock_t start = clock();
        for (int k=0;k<100;k++){
            for (int i=0;i<N;i++){
                for (int j=0;j<N;j++){
                    A[i][j] = rand();
                }
            }
        }

        cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl;
    }

    {
        double ** A = new double*[N];
        for (int i=0;i<N;i++) {
            A[i] = new double[N];
            memset(A[i], 0, sizeof(double) * N);
        }

        srand(0);

        clock_t start = clock();
        for (int k=0;k<100;k++){
            for (int i=0;i<N;i++){
                for (int j=0;j<N;j++){
                    A[i][j] = rand();
                }
            }
        }

        cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl;
    }

    {
        double * A = new double[N * N];

        srand(0);

        clock_t start = clock();
        for (int k=0;k<100;k++){
            for (int i=0;i<N;i++){
                for (int j=0;j<N;j++){
                    A[i*N + j] = rand();
                }
            }
        }

        cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl;
    }

    {
        double * A = new double[N * N];
        memset(A, 0, sizeof(double) * N * N);

        srand(0);

        clock_t start = clock();
        for (int k=0;k<100;k++){
            for (int i=0;i<N;i++){
                for (int j=0;j<N;j++){
                    A[i*N + j] = rand();
                }
            }
        }

        cout << "T="<< (clock()-start)/(double)(CLOCKS_PER_SEC/1000)<< "ms " << endl;
    }
}
Jonas
  • 6,915
  • 8
  • 35
  • 53
  • here is the problem: I have tried both yours and sonystarmap's solution. I directly copied it to my visual studio. Built it. ran the executable. My C++ result came as 83000ms while Matlab result is 38 seconds (38000 ms) I do not see why? – Serdar Oztetik Jul 29 '15 at 09:03
  • 2
    Could you provide your project setup? I'm starting to think you are compiling on debug mode without any optimization. – Jose Palma Jul 29 '15 at 09:12
  • here is the photo of the screen where I built the code. I copied your code changed to release from debug. Built it. went to the folder where .exe exists. ran it. that's all I did and I do not know what else to do. You said something about compiling with optimizations on but I do not know how to do it. https://www.dropbox.com/s/hoxffuo3yeuabwe/Untitled.png?dl=0 – Serdar Oztetik Jul 29 '15 at 09:42
  • 1
    I don't know enough about VS, we need the arguments for the compiler, maybe this is under Project properties or somethign? – Thomas Jul 29 '15 at 10:16
0
#include <vector>
#include <iostream>
#include <cstdlib>
#include <ctime>
#include <cstring>

int main() {
  const int N = 5000;
  std::vector<int> A(N*N);
  srand(0);
  clock_t start = clock();
  for(int k = 0; k < 100; ++k){
    for(int i = 0; i < N * N; ++i) {
        A[i] = rand();
    }
  }
  std::cout << (clock()-start)/(double)(CLOCKS_PER_SEC/1000) << "ms" << "\n";
  return 0;
}

Went from 25-27 seconds on my workstation without any optimization flag on the compiler to 21 seconds with

-O3 -g -Wall -ftree-vectorizer-verbose=5 -msse -msse2 -msse3 -march=native -mtune=native -ffast-math

Jose Palma
  • 756
  • 6
  • 13