Lambda function slower than separated method

Question

Similar questions have been addressed in Understanding the overhead of lambda functions in C++11 and C++0x Lambda overhead. However the first one was a fluke and the second is comparing it with function objects.

I am checking performance (compiler: g++; standard: c++17) of a function computing a simple distance (code below) using both a lambda and a separated function. The lambda function is systematically slower, by about 3% in runtime when checking with multiple inputs. Code attached below.

Is there any possibility that the compiler is optimizing calls to the separated method but not doing the same with the lambda expression?

#include <vector>
#include <iostream>
#include <chrono>

using namespace std;


template <
    class result_t   = std::chrono::milliseconds,
    class clock_t    = std::chrono::steady_clock,
    class duration_t = std::chrono::milliseconds>
auto since(std::chrono::time_point<clock_t, duration_t> const& start)
{
    return std::chrono::duration_cast<result_t>(clock_t::now() - start);
}

int manhattanDistance(int x1, int y1, int x2, int y2) {
    return abs(x1 - x2) + abs(y1 - y2);
}

int fWithoutLambda(int x, int y, vector<vector<int>>& points) {
    int min_d = numeric_limits<int>::max();
    int idx = -1;
    
    for (int i = 0; i < points.size(); i++) {
        auto d = manhattanDistance(x, y, points[i][0], points[i][1]);
        if ((x == points[i][0] || y == points[i][1]) && d < min_d) {
            idx = i;
            min_d = d;
        }
    }
    
    return idx;
}

int fWithLambda(int x, int y, vector<vector<int>>& points) {
    auto manh = [x, y](int x2, int y2) {return abs(x - x2) + abs(y - y2);};
    
    int min_d = numeric_limits<int>::max();
    int idx = -1;
    
    for (int i = 0; i < points.size(); i++) {
        auto d = manh(points[i][0], points[i][1]);
        if ((x == points[i][0] || y == points[i][1]) && d < min_d) {
            idx = i;
            min_d = d;
        }
    }
    
    return idx;
}

int main() {

    size_t repeats = 10;
    int time_no_lambda{0};
    int time_lambda{0};

    for (size_t j = 0; j < repeats; ++j) {
        int n = 1000000;
        vector<vector<int>> v(n);
        int counter = 0;
        for (auto& el : v) {
            v[counter++] = {counter, counter%10};
        }

        int x = 10, y = 5;

        auto start = std::chrono::steady_clock::now();  
        fWithLambda(x, y, v);
        auto tim_0 = since(start).count();
        time_lambda += tim_0;
        fWithoutLambda(x, y, v);
        time_no_lambda += (since(start).count() - tim_0);
    }

    std::cout << "Elapsed(ms), lambda =" << (time_no_lambda/repeats) << std::endl;  
    std::cout << "Elapsed(ms), no lambda =" << (time_lambda/repeats) << std::endl;  
}

They compile to the same assembly code: https://godbolt.org/z/9WqfMbE39 (With gcc 12 at least). Probably an artefact of small sample size or some cache effect since you time one after the other. What exact compiler version are you using? — Artyer, Jul 06 '22 at 08:43
g++ 12.1 compiles both versions to the [exact same assembly](https://godbolt.org/z/oW1K4Kh5h) on `-O2` or higher, so what you're seeing are probably hardware side effects — perivesta, Jul 06 '22 at 10:17
@Artyer Apple clang version 13.1.6. Good resource to check the assembly code, it sort of settles the suspicion of this having anything to do with the lambda function syntax, unless this result is specific to gcc versions (which seems rather unlikely) — A. Fenzry, Jul 07 '22 at 01:36
@Jesper no, I'm using the default options for gcc 13.1.6 and using the c++17 standard — A. Fenzry, Jul 07 '22 at 01:38
@A.Frenzy Then I suggest that the first thing you do is to build your program with compiler optimizations enabled. You'll most likely see that you then get the same performance in both cases. By default compilers optimize for a good debug experience and not performance - you have to tell them explicitly that you want performance over debugability. — Jesper Juhl, Jul 10 '22 at 17:43

Lambda function slower than separated method

0 Answers0