std::function
causes a few kinds of overhead.
First, it is difficult for the compiler to understand. If you had a raw function pointer, some compilers are able to "undo" the indirection easier than they can with std function. However, in these cases, often the raw function pointer and std function use was a bad one in the first place.
Second, typically how std function is implemented involves a virtual function table, which results in up to 2 indirections instead of the one of a function pointer. This hit is largest when the virtual function table falls out of your CPUs cache.
Third, C++ compilers are great at inlining, and indirection through a std function blocks that.
Now, in my experience, this overhead gets worst when you are doing buffer processing, such as a per-pixel operation.
In this case, you can have millions or billions of pixels you are working on. The work done per pixel is small, and the overhead of going through a std function call on each and every operation ends up being large compared to the actual work done.
The simplest way to solves this (and related) problems is to save a buffer processing function instead of a per-element function, like this.
using Pixel = std::uint32_t;
using Scanline = std::span<Pixel>;
using ScanlineOp = std::function<void(Scanline)>;
template<class PixelOp>
ScanlineOp MakeScanlineOp( PixelOp op ) {
return [op=std::move(op)](Scanline line) {
for (Pixel& p : line)
op(p);
};
}
here I take the per-pixel operation, and I save it along with iteration code into a std::function
.
Now when processing a 4000 pixel by 4000 pixel image, instead of suffering std::function
overhead 16 million times, I instead run into it 4000 times. Which reduces the cost of the overhead by 99.975% percent.
Make something 4000x faster a few times and you stop caring about how much it costs.
Now, std::span
is a type not in c++11. Here is a toy version:
template<class It>
struct range {
It b, e;
using reference = typename std::iterator_traits<It>::reference;
using value_type = typename std::iterator_traits<It>::value_type;
range( It s, It f ):b(s), e(f) {}
It begin() const { return b; }
It end() const { return e; }
bool empty() const { return begin()==end(); }
reference front() const { return *begin(); }
};
template<class It>
struct random_range:range<It> {
using range<It>::range;
using reference = typename range<It>::reference;
reference back() const { return *std::prev(this->end()); }
std::size_t size() const { return this->end()-this->begin(); }
reference operator[](std::size_t i) const{ return this->begin()[i]; }
};
template<class T>
struct array_view:random_range<T*> {
array_view( T* start, T* finish ):random_range<T*>(start, finish) {}
array_view( T* start, std::size_t length ):array_view(start, start+length) {}
array_view():array_view(nullptr, nullptr) {}
template<class C>
using data_type = typename std::remove_pointer< decltype( std::declval<C>().data() )>::type;
template<class U>
static constexpr bool pointer_compatible() {
return
std::is_same<
typename std::decay<U>::type,
typename std::decay<T>::type
>::value
&& std::is_convertible<U*, T*>::value;
}
// accept any container whose
template<class C,
typename std::enable_if< pointer_compatible<data_type<C>>(), bool >::type = true
>
array_view( C&& c ):array_view(c.data(), c.size()) {}
};
The complex part is where I accept vector
or array
because its .data()
field exists, returns a compatible pointer.
You'd convert code that looks like:
void foreachPixel( PixelOp op, Image img ) {
for (int i = 0; i < img.height(); ++i)
for (int j = 0; j < img.width(); ++j)
op(img[i][j]);
}
to
void foreachPixel( ScanlineOp op, Image img ) {
for (int i = 0; i < img.height(); ++i)
op(img.Scanline(i));
}
now, what I'm demonstrated is aimed at one concrete case. The general idea is that you can inject some of the low-level control flow into your std::function
and operate one level higher, and thus remove almost all of the std::function
overhead.