C and C++ weird performance difference

Question

Before you read my post, please take into consideration that I am new to both C and C++. I'm mostly a managed code developer.

I have two pieces of identical code (or that's what I believe at least). One in C and one in C++. The code basically checks if the number is a prime, and if it is, it will store it in a container.

C++

Main.cpp

#include <iostream>
#include <vector>
#include <time.h>

static bool isPrime(const int& number) {

    if((number & 1) == 0) {
        if(number == 2)
            return true;
        else
            return false;
    }

    for(int i = 3; (i * i) <= number; i++) {

        if((number % i) == 0)
            return false;
    }

    return number != 1;
}

int main(int argc, const char * argv[]) {

    std::vector<int> vector;
    clock_t start = clock();
    for(int i = 0; i < 30000000; i++) {

        if(isPrime(i))
            vector.push_back(i);
    }
    clock_t end = clock();
    clock_t seconds = (end - start) / CLOCKS_PER_SEC;
    std::cout << "done after " << seconds << " seconds " << std::endl;

    return 0;
}

C

Vector.c

#include <stdlib.h>


typedef struct vector_class {
    void(*push_back)(struct vector_class *vector_instance, const int *data);
    int *data;
    int length;
    int capacity;
} vector;

static void push_back(vector *vector_instance, const int *data) {

    if(vector_instance->length >= vector_instance->capacity) {

        vector_instance->capacity *= 2;
        vector_instance->data = (int*) realloc(vector_instance->data, sizeof(int) * vector_instance->capacity);
    }
    vector_instance->data[vector_instance->length] = *data;
    vector_instance->length++;
}

static void vector_constructor(vector *vector_instance) {

    vector_instance->push_back = &push_back;
    vector_instance->length = 0;
    vector_instance->capacity = 2;
    vector_instance->data = (int*)malloc(sizeof(*vector_instance->data) * vector_instance->capacity);

}

static void vector_destructor(vector *vector_instance) {

    free(vector_instance->data);
    vector_instance->length = 0;
    vector_instance->capacity = 0;
    vector_instance->data = NULL;
}

Main.c

#include <stdio.h>
#include "vector.c"
#include <time.h>

static int isPrime (const int *number) {

    if((*number & 1) == 0) {
        if(*number == 2)
            return 1;
        else
            return 0;
    }

    for(int i = 3; (i * i) <= *number; i += 2) {

        if((*number % i) == 0)
            return 0;
    }

    return *number != 1;
}

int main(int argc, const char * argv[]) {
    vector v;
    vector_constructor(&v);
    clock_t start = clock();
    for(int i = 0; i <= 30000000; i++) {

        if(isPrime(&i))
            v.push_back(&v, &i);
    }
    clock_t end = clock();
    clock_t seconds = (end - start) / CLOCKS_PER_SEC;
    printf("%lu seconds \n", seconds);

    for(int i = 0; i < v.length; i++) {

        //printf("%d \n", v.data[i]);
    }
    vector_destructor(&v);
    return 0;
}

I compile both programs on my OS X Mavericks, with the built in Clang compiler.

C++

g++ -O3 -std=c++11 Main.cpp

C

gcc -O3 -std=c99 Main.c

Both get compiled trouble free, and they also run trouble free. However..

I get different time results.

C finishes after 12 seconds

C++ finishes after 26 seconds

Can anyone point out what I am doing wrong? Thanks!

Your code is not quite the same - the loop in `isPrime` has different increments. Not sure if this'd make a difference though. — Drew McGowen, Mar 16 '15 at 13:35
It's hilarious that your C code is calling push_back through a function pointer and it is still much faster than your C++. But I think the different loop increments are the main reason for the performance difference. Your C++ version is likely doing twice as many moduli as your C version. — jschultz410, Mar 16 '15 at 13:37
Also, the STL `vector` may have different rules for expanding its capacity. If it, say, uses `capacity *= 1.5`, it'll call `new` more times. — Drew McGowen, Mar 16 '15 at 13:38
@DrewMcGowen I think it would make that much of a difference. The C++ version is checking twice as many divisions and is running about twice as long. — IronMensan, Mar 16 '15 at 13:43
@Daniel, try to make two different tests. First - adding in vector N numbers in C and C++, second - Checking IsPrime(...) M times. That will help you to understand where is problem, in function or in vector. I guess that increasing +2 in C vs increasing +1 in C++ may give twice more operations in C++, so that may be reason of your result. — Arkady, Mar 16 '15 at 13:53
Thanks guys. I did modify the loops, from ++ to +=2. And changed the C vector from *2 to *1.5 C is still faster than C++ by 3 seconds, but I guess thats compiler optimisation. — Daniel, Mar 16 '15 at 14:15
Unrelated to the optimization: The way you are using the C version's push_back with passing the address of i; you are passing the same address each time. And the value at that address is changing in each iteration of the for loop. — A.J., Mar 16 '15 at 15:18
Interestingly, on my machine, after fixing the loop iterator, the C runs in 12.78 seconds and the C++ in 12.81 -- basically the same. (I used `time ./prime` on UNIX to get more granularity on the time.) — JohnH, Mar 16 '15 at 15:29

jschultz410 · Accepted Answer · 2015-03-16T14:20:40.457

6

Your programs are subtly different in isPrime. In your C++ program:

for(int i = 3; (i * i) <= number; i++) {

In your C program:

for(int i = 3; (i * i) <= *number; i += 2) {

So, your C++ program is computing the remainder about twice as many times as your C program, which likely explains your performance discrepancy.

Beyond that, I recommend that you not pass int by reference or pointer unless you have a good reason. Hopefully, the compiler would be smart enough to figure out that you didn't need to and optimize that out, but who knows?

Also, you want to avoid calling functions through function pointers, like you do in your C program, when you can. They usually hurt a compiler's ability to inline optimize functions. It might be the case here that the compiler is smart enough to inline the call anyway, but again who knows?

Finally, if computing all primes less than N is really what you are after and this just isn't a toy to benchmark C vs. C++, then look into the Sieve of Eratosthenes or the Sieve of Sundaram. Alternatively, you can pass your vector of already known primes into isPrime and check only against already known primes rather than all odd numbers.

edited Mar 16 '15 at 14:20

answered Mar 16 '15 at 13:55

jschultz410

2,849
14
22

Thank you, That helped significantly. However, it is still 4 seconds slower. Is there something else that I have overseen? – Daniel Mar 16 '15 at 13:56
1

Someone else remarked that a common implementation of vector grows its capacity by 1.5 rather than 2 each time. You could change your C program to grow by 1.5 rather than 2 each time and see if that closes the gap entirely. – jschultz410 Mar 16 '15 at 13:58
I changed that as well, it keeps getting closer. C finishes after 11 seconds. C++ finishes after 13 seconds.. I guess this is optimal enough for me :) – Daniel Mar 16 '15 at 14:12
@jshultz410: GCC supposedly also uses a factor of 2x. Personally I would tend to suspect `realloc`. A std::vector ought to copy over the data manually while `realloc` may use `mremap` or other tricks to avoid the copying. – doynax Mar 16 '15 at 14:12
Try adding `vector.reserve( 5,761,455 );` That will completely eliminate all the `std::vector<>` reallocations and you'll run a lot faster. – Rob K Mar 16 '15 at 14:13
@doynax I'd be surprised if std::vector didn't have an optimized template for POD's (or at least pointers) that pulled the same kind of tricks. – jschultz410 Mar 16 '15 at 14:17
1

@RobK hopefully you mean `vector.reserve(5761455);` rather than using the comma operator! – M.M Mar 16 '15 at 14:22
@jschultz410: Possibly but I had trouble making GCC generate `realloc` the last time I tried, which admittedly was a while back. It would need to bypass the allocator interface, and specialize on POD together with the standard allocator. – doynax Mar 16 '15 at 14:29
@jschultz410 I doubt you will find a standard library that uses `realloc` in any version of `vector`. They all go through the allocator interface, which doesn't support anything `realloc`-like. – Sebastian Redl Mar 16 '15 at 14:29
Yes, I meant to clip those commas out! – Rob K Mar 16 '15 at 14:32
That's interesting that they didn't put a realloc like interface into std::allocator considering that C had that approach for so long. A default implementation that used allocate, memcpy and deallocate could easily be provided. Then only containers that could benefit from it need use it. I guess vector, deque and unordered_map of POD's might be the only containers to benefit from it, so they considered it not worth complicating the interface ... – jschultz410 Mar 16 '15 at 14:42
On my systemwith mingw-4.9.2-x86_64, both take 66 seconds . Note that this program does not really test vector, since nearly all of the processing time is spent doing divisions. , for which both versions generate effectively the same assembly. You could use something like gprof to see where time is spent. – M.M Mar 16 '15 at 14:47
@jshultz410: Perhaps but since when did the C++ committee give a toss about not complicating the language? *Grumble..* – doynax Mar 16 '15 at 14:53
1

@jschultz410 http://stackoverflow.com/questions/3105001/why-is-there-no-reallocation-functionality-in-c-allocators – M.M Mar 16 '15 at 15:42

C and C++ weird performance difference

1 Answers1