3

as it is so common with Fortran, I'm writing a massively parallel scientific code. In the beginning of my code I read my configuration file which tells me which type of solver I want to use. Now that means that in a subroutine (during the main run) I have

if(solver.eq.1)then
  call solver1()
elseif(solver.eq.2)then
  call solver2()
else
  call solver3()
endif

Edit to avoid some confusion: This if is inside my time integration loop and I have one that is inside 3 nested loops.

Now my question is, wouldn't it be more efficient to use function pointers instead as the solver variable will not change during execution, except at the initialisation procedure.

Obviously function pointers are F2003. That shouldn't be a problem as long as I use gfortran 4.6. But I'm mainly using a BlueGene P, there is a f2003 compiler, so I suppose it's going to work there as well although I couldn't find any conclusive evidence on the web.

Azrael3000
  • 1,847
  • 12
  • 23
  • If you are working on a BlueGene why aren't you using XLFortran ? It is very well documented, you could start here http://www-01.ibm.com/software/awdtools/fortran/ – High Performance Mark Mar 22 '12 at 20:00
  • Sorry for not being explicit. Of course I use XLFortran on the BG. And why did I not find this page today. Must have been looking for the wrong keywords. It clearly states that procedure pointers are ok. Thx. – Azrael3000 Mar 22 '12 at 20:02
  • for this specific situation I'd use a pre-processor and get the conditionals out of the executable all together. – agentp Oct 17 '16 at 20:10

4 Answers4

3

Knowing nothing about Fortran, this is my answer: The main problem with branching is that a CPU potentially cannot speculatively execute code across them. To mitigate this problem, branch prediction was introduced (which is very sophisticated in modern CPUs).

Indirect calls through a function pointer can be a problem for the prediction unit of the CPU. If it can't predict where the call will actually go, this will stall the pipeline.

I am quite sure that the CPU will correctly predict that your branch will always be taken or not taken because it is a trivial case of prediction.

Maybe the CPU can speculate across the indirect call, maybe it can't. This is why you need to test which is better.

If it cannot, you will certainly notice in your benchmark.

In addition, maybe you can hoist the if test out of your inner loop so it won't be called often. This will make the actual performance of the branch irrelevant.

usr
  • 168,620
  • 35
  • 240
  • 369
  • 2
    Seeing your edit, I can only stress my last recommendation: Duplicate the inner loops three times and do the if-test outside of that loop. This will move the branch off of the hot path. – usr Mar 22 '12 at 22:14
  • Clearly that is an option. But that will make the code unreadable if, for example, you have two different properties with 3 options each that makes 9 different ifs to maintain. – Azrael3000 Mar 24 '12 at 10:34
  • Maybe you can push the loop into the solver. Then you can all the solver through a function pointer and still have no performance loss. – usr Mar 24 '12 at 11:14
2

If you only plan to use the function pointers once, at initialisation, and you are running codes on a BlueGene, isn't your concern for the efficiency mis-directed ? Generally, any initialisation which works is OK, if it takes 1sec instead of 1msec it's probably going to have 0 impact on total execution time.

Code initialisation routines for clarity, ease of modification, that sort of thing.

EDIT

My guess is that using function pointers rather than your current code will have no impact on execution speed. But it's just a (educated perhaps) guess and I'll be very interested in any data you gather on this question.

High Performance Mark
  • 77,191
  • 7
  • 105
  • 161
  • sorry now that I read my question again I see your confusion. I will update it in a second. – Azrael3000 Mar 22 '12 at 20:16
  • Okay. Well I'll wait and see if somebody has some insight to share. Otherwise I'll try and scratch my scarce time together to do some benchmarks. I'll post them here. – Azrael3000 Mar 22 '12 at 21:37
1

After a brief search I couldn't find the answer to the question, so I ran a little benchmark myself (see this link for the Makefile & dependencies). The benchmark consists of:

  • Draw random number to select method a, b, or c, which all perform a simple addition to their single integer argument
  • Call the chosen method 100 million times, using either a procedure pointer or if-statements
  • Repeat the above 5 times

The result with gfortran 4.8.5 on an CPU E5-2630 v3 @ 2.40GHz is:

Time per call (proc. pointer):       1.89 ns
Time per call (if statement):        1.89 ns

In other words, there is not much of a performance difference!

1

If you solver routines take a non-trivial runtime, then the trivial runtime of the IF statements is likely to be immaterial. If the sovler routines have a comparable runtine to the IF statement, then the total runtime is very short, so why do your care? This seems an optimization unlikely to pay off.

The first rule of runtime optimization is to profile your code is see what portions are consuming the runtime. Otherwise you are likely to optimize portions that are unimportant, which will accomplish nothing.

For what its worth, someone else recently had a very similar concern: Fortran Subroutine Pointers for Mismatching Array Dimensions

Community
  • 1
  • 1
M. S. B.
  • 28,968
  • 2
  • 46
  • 73