5

I'm trying to fine tune some benchmark code we are using and am wondering if there is a way to communicate to GCC explicitly how to order certain bits of code. For example, given these blocks of code:

  1. Pre
  2. Start-Timer
  3. Body
  4. Stop-Timer
  5. Post

I wish to tell GCC that each block must be kept in the above order without any instruction leakage into the other block. Ideally the timer would measure only Step 3, however, for practical reasons measuring at least Step 3 and at most Steps 2-4 will suffice. I just want to make sure I'm note measuring any part of Step 1 or 5.

Currently I use a __sync_synchronize in the Timer functions to issue a full memory fence. My hope is that, in addition to being a fence, that this function is marked to prevent reordering.

Is this call to __sync_synchronize sufficient? Also logically, would the C++11 fence commands also suffice according to the text of the standard?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
edA-qa mort-ora-y
  • 30,295
  • 39
  • 137
  • 267
  • makes me wonder : why would you want to do that? Also, there is nothing in the standard regarding order of code bits (whatever that is) – BЈовић Jan 29 '12 at 14:23
  • @VJovic, because I need to time the performance of some code. And there _absolutely_ is a lot in the C++11 standard about ordering code (synchronizes-with, happens-before, etc.) – edA-qa mort-ora-y Jan 29 '12 at 14:33
  • I might have misunderstood the question, but if the compiler reorders the code blocks execution, then it is a bug in the compiler, or linker. So, what exactly are you asking? – BЈовић Jan 29 '12 at 14:37
  • The problem is the optimizer. Since the _Body_ may have no data/execution dependencies on the other code the optimizer may decide to move it around since it doesn't violate any "as-if" requirements/visible side-effects. – edA-qa mort-ora-y Jan 29 '12 at 14:42
  • disable the optimizer? Then there will be no reordering. Although some CPUs can also reorder instructions. – rve Jan 29 '12 at 15:01
  • @rve, I'm trying to measure optimized code. The Fence will take care of CPU reorganization (well, at least partially, good enough for x86). – edA-qa mort-ora-y Jan 29 '12 at 15:02
  • You can enable optimization only for the function with the code you are measuring and disable optimization for the pre and post functions. – rve Jan 29 '12 at 15:06
  • Disabling optimisation does not work. I am facing the same problem. The measurements functions are inline and in the optimisation strategy of gcc, it may move some register assignments slightly up or slightly down from the actual source code, with the side-effect of extending and shrinking what is actually measured. – Philippe F Jan 22 '19 at 10:13

2 Answers2

2

If the Start-Timer is a function call and the Stop-Timer is another function call, the optimizer has little opportunity to move the Body around, or spill material from Pre or Post into Body.

All the side-effects from Pre must be complete before the Start-Timer function is called (there's a sequence point there). All the side effects of Stop-Timer must be complete before executing Post (there's a sequence point there, too). So the compiler would have to have the code for Start-Timer and Stop-Timer visible to monkey with the generated code, spilling material around, and I'm not convinced it could do so even then.

So, in summary, I don't think you have to worry about it if you use function calls to start and stop the timer.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • This is my main concern, since the _Timer_ functions are tiny and virtually guaranteed to be inline. Thus the optimizer has full knowledge of what happens there. Also, the _Pre_ _Post_ and _Body_ can very well belong to a single function, so again the optimizer has full knowledge. – edA-qa mort-ora-y Jan 29 '12 at 14:34
  • Put the timer functions in a separate source file. This works until the 'whole program optimization' stuff takes place. Even then, I think the sequence point requirements limit what GCC can do (because the timers will have side-effects, recording the times somewhere). – Jonathan Leffler Jan 29 '12 at 15:40
  • If this works, then simply having the call to `clock_gettime` in the timer functions itself is probably sufficient. I can't see it has any knowledge of that function call. – edA-qa mort-ora-y Jan 29 '12 at 16:12
0

Make two versions of the code: one with the real code you want to measure, one with stubs. Measure both. Subtract. Then, I think, you needn't care what GCC does.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
  • I'm trying to address the correctness of the measurement. If it is wrong then the measuring of an empty stub will also be wrong. – edA-qa mort-ora-y Jan 29 '12 at 14:43
  • How? Subtracting will subtract out any error, unless you think that the optimizer will do something radically different in the two cases. – bmargulies Jan 29 '12 at 16:40