still obey the inline
keyword ... would a new frame ... be pushed onto the stack
That isn't what the inline
keyword does in the first place (see this question for extensive reference).
Assuming, as Barry does, that you're hoping to persuade the optimiser to inline your function call (once more for luck: this is nothing to do with the inline
keyword), function template+lambda is probably the way to go.
To see why this is, consider what the optimiser has to work with in each of these cases:
function template + lambda
template <typename F>
void run(F frame) { while(!interrupt) frame(); }
// ... call site ...
run([]{ onFrame(); });
here, the function only exists at all (is instantiated from the template) at the call site, with everything the optimizer needs to work in scope and well-defined.
Note the optimizer may still reasonably choose not to inline a call if it thinks the extra instruction cache pressure will outweigh the saving of stack frame
function pointer
void run(void (*frame)()) { while(!interrupt) frame(); }
// ... call site ...
run(onFrame);
here, run
may have to be compiled as a standalone function (although that copy may be thrown away by the linker if it can prove no-one used it), and same for onFrame
, especially since its address is taken. Finally, the optimizer may need to consider whether run
is called with many different function pointers, or just one, when deciding whether to inline these calls. Overall, it seems like more work, and may end up as a link-time optimisation.
NB. I used "standalone function" to mean the compiler likely emits the code & symbol table entry for a normal free function in both cases.
std::function
This is already getting long. Let's just notice that this class goes to great lengths (the type erasure Barry mentioned) to make the function
void run(std::function<void()> frame);
not depend on the exact type of the function, which means hiding information from the compiler at the point it generates the code for run
, which means less for the optimiser to work with (or conversely, more work required to undo all that careful information hiding).
As for testing what your optimiser does, you need to examine this in the context of your whole program: it's free to choose different heuristics depending on code size and complexity.
To be totally sure what it actually did, just disassemble with source or compile to assembler. (Yes, that's potentially a big "just", but it's platform-specific, not really on-topic for the question, and a skill worth learning anyway).