Why is my C++ code three times slower than the C equivalent on LeetCode?

Question

I've been doing some of the LeetCode problems, and I notice that the C solutions are a couple of times faster than the exact same thing in C++. For example:

Updated with a couple of simpler examples:

Given a sorted array and a target value, return the index if the target is found. If not, return the index where it would be if it were inserted in order. You may assume no duplicates in the array. (Link to question on LeetCode)

My solution in C, runs in 3 ms:

int searchInsert(int A[], int n, int target) {
    int left = 0;
    int right = n;
    int mid = 0;
    while (left<right) {
        mid = (left + right) / 2;
        if (A[mid]<target) {
            left = mid + 1;
        }
        else if (A[mid]>target) {
            right = mid;
        }
        else {
            return mid;
        }
    }
    return left;
}

My other C++ solution, exactly the same but as a member function of the Solution class runs in 13 ms:

class Solution {
public:
    int searchInsert(int A[], int n, int target) {
        int left = 0;
        int right = n;
        int mid = 0;
        while (left<right) {
            mid = (left + right) / 2;
            if (A[mid]<target) {
                left = mid + 1;
            }
            else if (A[mid]>target) {
                right = mid;
            }
            else {
                return mid;
            }
        }
        return left;
    }
};

Even simpler example:

Reverse the digits of an integer. Return 0 if the result will overflow. (Link to question on LeetCode)

The C version runs in 6 ms:

int reverse(int x) {
    long rev = x % 10;
    x /= 10;
    while (x != 0) {
        rev *= 10L;
        rev += x % 10;
        x /= 10;
        if (rev>(-1U >> 1) || rev < (1 << 31)) {
            return 0;
        }
    }
    return rev;
}

And the C++ version is exactly the same but as a member function of the Solution class, and runs for 19 ms:

class Solution {
public:
    int reverse(int x) {
        long rev = x % 10;
        x /= 10;
        while (x != 0) {
            rev *= 10L;
            rev += x % 10;
            x /= 10;
            if (rev>(-1U >> 1) || rev < (1 << 31)) {
                return 0;
            }
        }
        return rev;
    }
};

I see how there would be considerable overhead from using vector of vector as a 2D array in the original example if the LeetCode testing system doesn't compile the code with optimisation enabled. But the simpler examples above shouldn't suffer that issue because the data structures are pretty raw, especially in the second case where all you have is long or integer arithmetics. That's still slower by a factor of three.

I'm starting to think that there might be something odd happening with the way LeetCode do the benchmarking in general because even in the C version of the integer reversing problem you get a huge bump in running time from just replacing the line if (rev>(-1U >> 1) || rev < (1 << 31)) { with if (rev>INT_MAX || rev < INT_MIN) {

Now, I suppose having to #include<limits.h> might have something to do with that but it seems a bit extreme that this simple change bumps the execution time from just 6 ms to 19 ms.

Because you're using a vector of vector, which means each of your columns (or rows) is a separate allocation which may or may not be in a cache friendly location. — Robinson, Apr 08 '15 at 18:10
Are those times from single runs, or averages over multiple runs? — John Bode, Apr 08 '15 at 18:10
Are your optimisations on? Some C++ compilers enable range-checking on vectors when it's off, like in debug builds. — molbdnilo, Apr 08 '15 at 18:13
What compiler options are you using? Why does the C++ solution get wrapped in a class, are you including the class initialization in your timer? Speaking of which, how are you timing the execution? — IllusiveBrian, Apr 08 '15 at 18:14
Are the allocating schemes different? A single block in C vs. multiple blocks (vectors) in C++. Anyway you should prefer a single block of memory (no vector of vector). — , Apr 08 '15 at 18:15
Try a thin [wrapper around a single vector instead](http://stackoverflow.com/a/12009991/179910). — Jerry Coffin, Apr 08 '15 at 18:23
Although identical syntactically, the expression "matrix[i][j]" in C and in C++ are likely implemented very differently in your case (you can make it equivalent in C++ by allocating matrix as one single block of memory stored in row major order). Look for a definition of "operator []" in your matrix class, it will likely point to where the problem is. — amdn, Apr 08 '15 at 18:25
Additionally, the C++ was probably compiled with exceptions enabled, and the C wasn't, which can _sometimes_ incur a performance penalty. To reduce this, mark C++ functions as `noexcept` or `nothrow` (depending on your compiler). — Mooing Duck, Apr 08 '15 at 18:41
With most halfway-modern compilers, exception don't incur an overhead unless they are thrown, @MooingDuck. The exception-handling code is not stored inline with the "normal" code and thus doesn't increase size and cache misses. An IMHO more likely influence is a container implementation with debug diagnostics. — Ulrich Eckhardt, Apr 08 '15 at 18:46
@UlrichEckhardt: Except, you know, [MSVC compiling x86](http://stackoverflow.com/a/3590198/845092) ([and here](https://msdn.microsoft.com/en-us/library/vstudio/c0hwkhwe(v=vs.100).aspx)), and [GCC on x86](http://sourceforge.net/p/mingw-w64/mailman/message/30171170/) (due to patent issues)... But you're right the containers + debug are a _way_ bigger consideration. — Mooing Duck, Apr 08 '15 at 18:53
Create a matrix class and represent 2D matrices with an inline (x, y) overloaded function call that internally calculates position into a single dimension vector. It'll then be almost on par with C rather than vector of vector as others have pointed out. — hookenz, Apr 08 '15 at 19:36
For benchmark questions you should post [MCVE](http://stackoverflow.com/help/mcve)s including the benchmarking code and your compiler switches. See bames53's code sample for what you should be posting. — M.M, Apr 09 '15 at 01:10
@Robinson That makes sense. I have found simpler examples that avoid vectors of vectors and any complex data structures in general. — nwod, Apr 09 '15 at 13:09
@Namfuak I have no control over the compiler options, and the C++ solution is wrapped in a class because that's the format the LeetCode website requires for C++ solutions. The code I have shown is all that I am allowed to provide as an answer, and the compiling and benchmarking is done by the LeetCode backend, presumably in a reasonable way. Or maybe not so reasonable way. I can't know. But they do specify that C++ environment is g++ 4.9.1 and the C one is gcc 4.9.1. — nwod, Apr 09 '15 at 13:14
@MattMcNabb Unfortunately, I don't have any control over the benchmarking code and compiler switches used on LeetCode. I have provided a couple of simpler MCVEs though. I know it's far from ideal but you can just copy and paste the whole example on the LeetCode website for completeness and verification... — nwod, Apr 09 '15 at 13:24

score 43 · Accepted Answer · edited May 23 '17 at 12:23

43

Lately I've been seeing the vector<vector<int>> suggestion a lot for doing 2d arrays in C++, and I've been pointing out to people why this really isn't a good idea. It's a handy trick to know when slapping together temporary code, but there's (almost) never any reason to ever use it for real code. The right thing to do is to use a class that wraps a contiguous block of memory.

So my first reaction might be to point to this as a possible source for the disparity. However you're also using int** in the C version, which is generally a sign of the exact same problem as vector<vector<int>>.

So instead I decided to just compare the two solutions.

http://coliru.stacked-crooked.com/a/fa8441cc5baa0391

6468424
6588511

That's the time taken by the 'C version' vs the 'C++ version' in nanoseconds.

My results don't show anything like the disparity you describe. Then it occurred to me to check a common mistake people make when benchmarking

http://coliru.stacked-crooked.com/a/e57d791876b9252b

18386695
42400612

Notice that the -O3 flag from the first example has become -O0, which disables optimization.

Conclusion: you're probably comparing unoptimized executables.

C++ supports building rich abstractions that don't require overhead, but eliminating the the overhead does require certain code transformations that play havoc with the 'debuggability' of code.

That means debug builds avoid those transformations and therefore C++ debug builds are often slower than debug builds of C style code because C style code just doesn't use much abstraction. Seeing a 130% slowdown such as the above is not at all surprising when timing, for example, machine code that uses function calls in place of simple store instructions.

Some code really needs optimizations in order to have reasonable performance even for debugging, so compilers often offer a mode that applies some optimizations which don't cause too much trouble for debuggers. Clang and gcc use -O1 for this, and you can see that even this level of optimization essentially eliminates the gap in this program between C style code and the more C++ style code:

http://coliru.stacked-crooked.com/a/13967ebcfcfa4073

8389992
8196935

Update:

In those later examples optimization shouldn't make a difference, since the C++ is not using any abstraction beyond what the C version is doing. I'm guessing that the explanation for this is that the examples are being compiled with different compilers or with some other different compiler options. Without knowing how the compilation is done I would say it makes no sense to compare these runtime numbers; LeetCode is clearly not producing an apples to apples comparison.

edited May 23 '17 at 12:23

Community

1
1

answered Apr 08 '15 at 19:31

bames53

86,085
15
179
244

1

I've often seen `std::vector` being optimized down to "raw" arrays, but, as you correctly say, you need to turn optimizations on. With C you're already using that kind of arrays, so unoptimized code will be faster. – edmz Apr 08 '15 at 20:09
@bames53 Actually, it's not always the case that `vector>` is worse than a custom struct. This depends a lot on the application. (E.g. adding another column is an extremely costly thing for the struct, but not for the nested vector). As always, one should profile the actual use case. – stefan Apr 08 '15 at 20:24
2

On a side note, gcc use `-Og` for debugger-friendly optimizations. – kiwixz Apr 08 '15 at 20:38
@black: You've seen `vector` optimized into a raw array? That shouldn't be possible, unless the optimizer is able to turn a heap allocation into a stack-based one, which is far beyond the optimization abilities of any compiler I've ever seen. – StilesCrisis Apr 08 '15 at 20:43
Pointers to a heap allocation are often referred to as raw arrays, because array and pointer syntax are interchangeable in C. Never heard that phrase to mean stack arrays specifically. – SilverbackNet Apr 09 '15 at 00:35
@stefan That's true, it does depend. However cases where `vector>` has an advantage are specialized and uncommon compared to how often a contiguous block of memory is a better arrangement. – bames53 Apr 09 '15 at 00:43
@bames53 That's a really good point. Sadly I have no control over how the code was compiled or benchmarked. However, I can see how it makes a lot of sense for LeetCode to be compiling and comparing the code people submit without compiler optimisations. I have gathered a couple of simpler minimal examples that don't involve any complicated data structures and updated the question. Do you suppose lack of compiler optimisation might still be the cause of the issue? – nwod Apr 09 '15 at 13:01
@barnes53 and to all: Just to be clear, he's not benchmarking the code himself. He's submitting this code to a competitive coding website which gives him back it's own interpretation of what the codes timing is. I imagine they don't give him control over optimization options. – MrSynAckSter Apr 09 '15 at 17:24
@nwod In those later examples optimization shouldn't make a difference, since the C++ is not using any abstraction beyond what the C version is doing. I'm guessing that the explanation for this is that the examples are being compiled with different compilers or with some other different compiler options. Without knowing how the compilation is done I would say it makes no sense to compare these runtime numbers; it's clearly not an apples to apples comparison. – bames53 Apr 09 '15 at 21:14
@bames53 That's what I thought too but the LeetCode website gives you a plot of the run time distributions for all accepted solutions in all programming languages on the same graph, so the implication is that the comparison is valid. On the other hand, having used LeetCode for a few days now, I'm becoming more and more convinced that there's a lot of corner cutting and roughness around the edges of that website... – nwod Apr 11 '15 at 10:10

score -4 · Answer 2 · edited Apr 08 '15 at 18:41

-4

You are using vector of vector in your C++ code snippet. Vectors are sequence containers in C++ that are like arrays that can change in size. Instead of vector<vector<int>> if you use statically allocated arrays, that would be better. You may use your own Array class as well with operator [] overloaded, but vector has more overhead as it dynamically resizes when you add more elements than its original size. In C++, you use call by reference to further reduce your time if you compare that with C. C++ should run even faster if written well.

edited Apr 08 '15 at 18:41

Mooing Duck

64,318
19
100
158

answered Apr 08 '15 at 18:37

Dr. Debasish Jana

6,980
4
30
69

3

I don't see how this answers the question. Is the C++ code changing the size of the vectors? Of not, then why should that contribute to the running time? The vector is already being passed by reference. Although you've made true statements, I don't quite see how they're relevant to the question at hand. – Rob Kennedy Apr 08 '15 at 18:48
3

There is absolutely no reason why C++ should be faster (or slower) than C, having this simple data structure. – Apr 08 '15 at 18:52
Adding to @DieterLücking: C++'s original "compiler" output was simply C code. All it did was transform (then) "C++" into C code to be fed to a C compiler. No reason why C++ would be slower. Its original name (I think) was "C with classes". – Cole Tobin Apr 08 '15 at 21:42
@RobKennedy Do you know why this question was downvoted? Robinson's comment and bames53's answer said essentially the same thing. – nick Apr 09 '15 at 00:50
I don't know why you're asking me, @Nick. Robinson's comment suggests the problem is cache locality. Bames's answer says the problem is optimization. Those aren't the same things. – Rob Kennedy Apr 09 '15 at 01:46

Why is my C++ code three times slower than the C equivalent on LeetCode?

2 Answers2