1

Suppose we have a Fortran function (for example a mathematical optimization algorithm) that takes as input, another Fortran function:

myOptimizer(func)

Now depending on the user's choice, the input function could be from a list of several different functions. This list of choices can be implemented via an if-block:

if (userChoice=='func1') then
    myOptimizer(func1)
elseif (userChoice=='func2') then
    myOptimizer(func2)
elseif (userChoice=='func3') then
    myOptimizer(func3)
end if

Alternatively, I could also define function pointers, and write this as,

if (userChoice=='func1') then
    func => func1
elseif (userChoice=='func2') then
    func => func2
elseif (userChoice=='func3') then
    func => func3
end if
myOptimizer(func)

Based on my tests with Intel Fortran Compiler 2017 with O2 flag, the second implementation happens to be slower by several factors (4-5 times slower than the if-block implementation). From the software development perspective, I would strongly prefer the second approach since it results in much more concise and cleaner code, at least in my problem where there is a fixed workflow, with different possible input functions to the workflow. However, performance also equally matters in the problem.

Is this loss of performance by indirect function calls, expected in all Fortran codes? or is it a compiler-dependent issue? Is there a solution to using indirect function calls without performance loss? How about other languages such as C/C++?

Scientist
  • 1,767
  • 2
  • 12
  • 20
  • 2
    Can you prepare a complete example ([mcve])? It could be important whether the interfaces are explicit, what the characteristics are, and so on. – francescalus Nov 16 '17 at 21:19

1 Answers1

1

This is a pure guess based on how compilers generally work and what might explain the 4-5x perf difference.

In the first version, maybe the compiler is inlining myOptimizer() into each call site with func1, func2, and func3 inlined into the optimizer, so when it runs there's no actual function pointer or function call happening.

An indirect function-call isn't much more expensive than a regular function call on modern x86 hardware. It's the lack of inlining that really hurts, especially for FP code. Spilling / reloading all the floating-point registers around a function call is expensive, especially if the function is fairly small.

i.e. what's probably hurting you is that your 2nd version convinces the compiler not to undo the indirection. This would be true in C / C++ as well.

Hand-holding your compiler into making fast asm probably means you have to write it the first way, unless there's a profile-guided optimization option you can use that might make the compiler realize this is a hot spot and it's worth trying harder with the source written the 2nd way. (Sorry I don't use Fortran, and I only know a few of the options for Intel's C/C++ compiler from looking at its asm output vs. gcc and clang on http://gcc.godbolt.org/)


To see if my hypothesis is right, check the compiler-generated asm. If the first version doesn't actually pass a function pointer to a stand-alone definition of myOptimizer, but the 2nd one does, that's probably all there is to it.

See How to remove "noise" from GCC/clang assembly output? for more about looking at compiler output. Matt Godbolt's CppCon2017 talk: “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” is a good intro to reading compiler output and why you might want to.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 2
    You can check that directly at the same godbolt page by using `-x f95` or `-x fortran` (I am not sure, but I think the former). I have always found GIMPLE dramatically easier to read than assembly, but I am not sure if inlining is there. – Vladimir F Героям слава Nov 16 '17 at 22:41
  • @VladimirF: Thanks for the `-x f95` tip. ICC / ifort doesn't actually use GIMPLE, though; it's one of gcc's internal representations. I've never gotten around to learning GIMPLE; reading x86 asm is necessary to check if the compiler really did a good job (for things other than inlining), and knowing x86 asm and what's efficient lets you figure out what direction to hand-hold the compiler in. – Peter Cordes Nov 16 '17 at 22:48
  • Right, the flags are not really usable for Intel. – Vladimir F Героям слава Nov 17 '17 at 06:33