0

I Created a struct of three integers and overloaded the operator*=, then created a vector of size 100000000 and filled it with 0 and 1 randomly.

Then tested two versions of a function that sums up the elements multiplied by 2. When I use the function that uses the overloaded operator*= I got almost a 60% overhead.

Any guess what is happening?

Tested on:

  • Windows 10
  • Visual Studio 2019 (MSVC 14.29.30133)
  • Project Configuration: Release x64 (O2, Whole Program Optimization enabled).

Thanks

Check main, without_overloading and with_overloading functions.

#include <random>
#include <vector>
#include <iostream>
#include <ctime>

/*
I took the code for calculating times from this post:

https://stackoverflow.com/questions/17432502/how-can-i-measure-cpu-time-and-wall-clock-time-on-both-linux-windows/17440673#17440673
*/
//  Windows
#ifdef _WIN32
#include <Windows.h>
double get_wall_time() {
    LARGE_INTEGER time, freq;
    if (!QueryPerformanceFrequency(&freq)) {
        //  Handle error
        return 0;
    }
    if (!QueryPerformanceCounter(&time)) {
        //  Handle error
        return 0;
    }
    return (double)time.QuadPart / freq.QuadPart;
}
double get_cpu_time() {
    FILETIME a, b, c, d;
    if (GetProcessTimes(GetCurrentProcess(), &a, &b, &c, &d) != 0) {
        //  Returns total user time.
        //  Can be tweaked to include kernel times as well.
        return
            (double)(d.dwLowDateTime |
                ((unsigned long long)d.dwHighDateTime << 32)) * 0.0000001;
    }
    else {
        //  Handle error
        return 0;
    }
}
//  Posix/Linux
#else
#include <time.h>
#include <sys/time.h>
double get_wall_time() {
    struct timeval time;
    if (gettimeofday(&time, NULL)) {
        //  Handle error
        return 0;
    }
    return (double)time.tv_sec + (double)time.tv_usec * .000001;
}
double get_cpu_time() {
    return (double)clock() / CLOCKS_PER_SEC;
}
#endif

//=================================================================================
const size_t size = 100000000;

struct vec {
    int x, y, z;
    vec& operator*=(int rhs) {
        x *= rhs;
        y *= rhs;
        z *= rhs;
        return *this;
    }
};
    
void without_overloading(std::vector<vec> const &points) {
    double wall0 = get_wall_time();
    double cpu0 = get_cpu_time();

    vec sum{ 0,0,0 };
    for (size_t i = 0; i < size; ++i) {
        sum.x += points[i].x * 2;
        sum.y += points[i].y * 2;
        sum.z += points[i].z * 2;
    }

    double wall1 = get_wall_time();
    double cpu1 = get_cpu_time();

    std::cout << "Wall Time = " << wall1 - wall0 << std::endl;
    std::cout << "CPU Time  = " << cpu1 - cpu0 << std::endl;
    std::cout << sum.x << " " << sum.y << " " << sum.z << std::endl;
}

void with_overloading(std::vector<vec> &points) {
    double wall0 = get_wall_time();
    double cpu0 = get_cpu_time();

    vec sum{ 0,0,0 };
    for (size_t i = 0; i < size; ++i) {
        points[i] *= 2;
        sum.x += points[i].x;
        sum.y += points[i].y;
        sum.z += points[i].z;
    }

    double wall1 = get_wall_time();
    double cpu1 = get_cpu_time();

    std::cout << "Overloading Wall Time = " << wall1 - wall0 << std::endl;
    std::cout << "Overloading CPU Time  = " << cpu1 - cpu0 << std::endl;
    std::cout << sum.x << " " << sum.y << " " << sum.z << std::endl;
}

int main() {
    std::srand(std::time(nullptr));
    std::vector<vec> points(size);
    for (int i = 0; i < size; ++i) {
        points[i] = { rand() % 2, rand() % 2, rand() % 2 };
    }

    without_overloading(points);
    with_overloading(points);
}
Saif_Qaher94
  • 72
  • 11
  • 7
    The functions don't do the same thing. The second one modifies the input vector, while the first does not. Of course there is a cost to storing the modified values. (Whether that explains your observed difference, I am not sure.) – user17732522 Feb 28 '23 at 21:20
  • 1
    Did you test an optimized build or a debug build (which would be pointless)? – Jesper Juhl Feb 28 '23 at 21:29
  • @Saif_Qaher94 thanks for that. user177's comments are even more important here. – JohnFilleau Feb 28 '23 at 21:29
  • @JesperJuhl MSVC full program optimization enabled (they just edited the question). – JohnFilleau Feb 28 '23 at 21:30
  • 2
    From what I see, you do three multiplications and one store in the `with_overloading` version that you don't do in the other version. That's double the amount of multiplications and stores in the "slow" version. Why is it surprising that it gets slower when you do almost the double amount of work in `with_overloading`? – Ted Lyngmo Feb 28 '23 at 21:31
  • 1
    (correction) _"... three multiplications and **three** stores ..."_ – Ted Lyngmo Feb 28 '23 at 21:37
  • 1
    @Saif_Qaher94 `points[i] *= 2;` calls your `operator*=` that does 3 multiplications and stores. (I see that you don't do the rest of the multiplications though in `with_overloading`, but you do the stores) - so, really what @user177 said at start. – Ted Lyngmo Feb 28 '23 at 21:39
  • 1
    A better test would be comparing `sum.x += points[i].x * 2; sum.y += points[i].y; sum.z += points[i].z;` vs `sum += points[i] * 2` with overloads for `+=` and `*` – Kevin Feb 28 '23 at 21:41
  • I overloaded the += and * operators as @Kevin said and I had similar results now. Thanks. – Saif_Qaher94 Feb 28 '23 at 21:52
  • Just wants to add that I got confused because I implemented the overloaded operator* using the implementation of operator*= by taking the lhs as a copy, which was doing additional stores. ' vec operator*(vec lhs, int rhs) { return lhs *= rhs; }; ' So originally sum += points[i] * 2 was performing wasted store operations. – Saif_Qaher94 Feb 28 '23 at 22:12
  • Is there a way to prevents the wasted store and keeps the implementation of operator* using operator*= ? – Saif_Qaher94 Feb 28 '23 at 22:15
  • 1
    It should be noted there is no need to perform the multiplication inside the loop at all. Just do `sum *= 2;` after the loop. A good compiler might do this optimization for you. – paddy Mar 01 '23 at 02:58

0 Answers0