11

The following function will randomly "sprinkle salt" on a loaded image. For the sake of boosting performance, the conditional statement

uint j = rows == 1 ? 0 : randomRow(generator);

should not be inside the loop.

Instead, I want to define a lambda getJ before the loop as

auto getJ = rows == 1 ? []() {return 0; } : []() {return randomRow(generator); };

However, my code with this lambda does not compile with the following red squiggled text:

enter image description here

Question

How to conditionally define such a lambda?

void salt_(Mat mat, unsigned long long n)
{
    const uchar channels = mat.channels();
    uint cols = mat.cols;
    uint rows = mat.rows;

    if (mat.isContinuous())
    {
        cols *= rows;
        rows = 1;
    }

    default_random_engine generator;
    uniform_int_distribution<uint> randomRow(0, rows - 1);
    uniform_int_distribution<uint> randomCol(0, cols - 1);



    // auto getJ = rows == 1 ? []() {return 0; } : []() {return randomRow(generator); };


    uchar * const data = mat.data;

    for (unsigned long long counter = 0; counter < n; counter++)
    {
        uint i = randomCol(generator);
        uint j = rows == 1 ? 0 : randomRow(generator);
        //uint j = getJ();

        uint index = channels * (cols * j + i);
        for (uchar k = 0; k < channels; k++)
            data[index + k] = 255;
    }
}
Second Person Shooter
  • 14,188
  • 21
  • 90
  • 165
  • 3
    "my code with this lambda does not compile" - What is the error you get? – Suma Apr 16 '19 at 12:35
  • You are trying to reference local variable `generator` without capturing it. If second lambda was capture-free it would compile. – user7860670 Apr 16 '19 at 12:36
  • 5
    Seems like a premature optimisation. Surely function call overhead would be greater than ternary overhead? – Artyer Apr 16 '19 at 12:40
  • 2
    Is a conditional *really* a performance issue here? Have you profiled the code? – Jesper Juhl Apr 16 '19 at 12:42
  • @JesperJuhl: Yes. I always do benchmark. – Second Person Shooter Apr 16 '19 at 12:43
  • 1
    @Artyer Done well, the call to the lambda can be inlined. – Angew is no longer proud of SO Apr 16 '19 at 12:43
  • Benchmark is always right, but this is certainly not what I would expect. What is your platform? My experience is function call will not be faster for most desktop CPUs than a condition. You could create a template from your loop and pass `getJ` as a functor to it - that way there would be no overhead for it. – Suma Apr 16 '19 at 12:47
  • @Suma The great thing about lambdas is that the compiler knows exactly what it does so it is much easier to inline. In this case there most likely wont be a function call and the `operator()` will be inlined into the code. https://stackoverflow.com/questions/40809985/are-functors-actually-faster-than-pointers-to-functions – NathanOliver Apr 16 '19 at 12:51
  • @NathanOliver This assumes you pass your lambda into a function template. I doubt you can get this benefit when passing it as a std::function or a function pointer parameter into an ordinary function. – Suma Apr 16 '19 at 13:04
  • @Suma Correct. But that is how you should be passing them. Using a function pointer doesn't let you have state and `std::function` is expensive and is really only needed if you need to store the function for later like as a class member. – NathanOliver Apr 16 '19 at 13:13
  • It seems like you are trying to manually [unswitch the loop](https://en.wikipedia.org/wiki/Loop_unswitching). Why didn't your compiler do that automatically? – Kevin Apr 16 '19 at 17:22
  • @Kevin: Thanks for the useful information! – Second Person Shooter Apr 16 '19 at 21:07

3 Answers3

10

my code with this lambda does not compile with the following red squiggled text

You cannot use randomRow inside the body of the lambda expression without capturing it beforehand, as the generated closure object needs to have access to it.

Even if you were to use [&randomRow], the code would still fail to compile as every lambda expression produces a closure of unique type, even if the lambda expressions are exactly the same.

You can turn the problem on its head to avoid any overhead and achieve what you want - create a function that takes the lambda you want to invoke:

template <typename F>
void saltImpl(F&& getJ, /* ... */)
{
    uchar * const data = mat.data;

    for (unsigned long long counter = 0; counter < n; counter++)
    {
        uint i = randomCol(generator);
        uint j = rows == 1 ? 0 : randomRow(generator);
        //uint j = getJ();

        uint index = channels * (cols * j + i);
        for (uchar k = 0; k < channels; k++)
            data[index + k] = 255;
    }
}

Usage example:

void salt_(Mat mat, unsigned long long n)
{
    const uchar channels = mat.channels();
    uint cols = mat.cols;
    uint rows = mat.rows;

    if (mat.isContinuous())
    {
        cols *= rows;
        rows = 1;
    }

    default_random_engine generator;
    uniform_int_distribution<uint> randomRow(0, rows - 1);
    uniform_int_distribution<uint> randomCol(0, cols - 1);

    if (rows == 1)
    {
        saltImpl([]{ return 0; }, /* ... */);
    }
    else
    {
        saltImpl([&]{ return randomRow(generator); }, /* ... */)
    }
}
Vittorio Romeo
  • 90,666
  • 33
  • 258
  • 416
3

Why this fails is because the lambdas are of a different type. That's natural, their operator() have different definitions. Which means you want your following code to work with two different types. And the C++ way of making code work with different types is using templates.

Convert the code using getJ to a function template (it can be local to your implementation file), like this:

template <class G>
void salt_impl_(Mat mat, unsigned long long n, default_random_engine &generator, G getJ)
{
    const uchar channels = mat.channels();
    uint cols = mat.cols;
    uint rows = mat.rows;

    if (mat.isContinuous())
    {
        cols *= rows;
        rows = 1;
    }

    uchar * const data = mat.data;

    uniform_int_distribution<uint> randomCol(0, cols - 1);

    for (unsigned long long counter = 0; counter < n; counter++)
    {
        uint i = randomCol(generator);
        uint j = getJ();

        uint index = channels * (cols * j + i);
        for (uchar k = 0; k < channels; k++)
            data[index + k] = 255;
    }
}


void salt_(Mat mat, unsigned long long n)
{
    const uchar channels = mat.channels();
    uint cols = mat.cols;
    uint rows = mat.rows;

    if (mat.isContinuous())
    {
        cols *= rows;
        rows = 1;
    }

    default_random_engine generator;
    uniform_int_distribution<uint> randomRow(0, rows - 1);

    if (rows == 1)
      salt_impl_(mat, n, generator, []() {return 0; });
    else
      salt_impl_(mat, n, generator, [&]() {return randomRow(generator); });
}

Feel free to reduce the initial-part duplication between the function and the template by passing more parameters, making them members of a class, or something similar.

Also note that the non-trivial lambda must capture the variables which it accesses (randomRow and generator). I did this using the universal by-reference capture [&] in the code above.

Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455
1

This is not quite the same thing, but you can set a std::function object to different lambdas based on a condition:

#include <functional>
...
std::function<int()> getJ;
if (rows == 1)
    getJ = [](){return 0;};
else
    getJ = [&](){return randomRow(generator);};
wcochran
  • 10,089
  • 6
  • 61
  • 69