8

I am using for/while loops for implementing a delay in my code. The duration of the delay is unimportant here though it is sufficiently large to be noticeable. Here is the code snippet.

uint32_t i;

// Do something useful

for (i = 0; i < 50000000U; ++i)
{}

// Do something useful

The issue I am observing is that this for loop won't get executed. It probably gets ignored/optimized by the compiler. However, if I qualify the loop counter i by volatile, the for loop seems to execute and I do notice the desired delay in the execution.

This behavior seems a bit counter-intuitive to my understanding of the compiler optimizations with/without the volatile keyword.

Even if the loop counter is getting optimized and being stored in the processor register, shouldn't the counter still work, perhaps with a lesser delay? (Since the memory fetch overhead is done away with.)

The platform I am building for is Xtensa processor (by Tensilica), and the C compiler is the one provided by Tensilica, Xtensa C/C++ compiler running with highest level of optimizations.

I tried the same with gcc 4.4.7 with -o3 and ofast optimization levels. The delay seems to work in that case.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
LoneWolf
  • 83
  • 7
  • The c compiler can detect that the for loop isn't doing anything at all, and can just remove the whole loop when i is not volatile. – user1937198 May 13 '15 at 07:19
  • 1
    Honestly, if I were a compiler, I would set i to 50000000U and be done, especially on high optimization levels. With volatile, it may be changed externally, so I couldn't optimize it out and read it from the register/cache/wherever in any iteration. – martin May 13 '15 at 07:19
  • 1
    The compiler doesn't care about preserving your variables or your loops; it cares about preserving things like input and output. Tearing out a do-nothing loop is completely permissible. – user2357112 May 13 '15 at 07:22
  • 5
    Also, since no one has mentioned it yet: **don't do this**. This is not how to create a delay. [Use one of the functions designed for the job.](http://www.gnu.org/software/libc/manual/html_node/Sleeping.html) – user2357112 May 13 '15 at 07:27
  • 1
    "I am using for/while loops for implementing a delay in my code." very bad idea at the first place... – glglgl May 13 '15 at 07:27
  • @glglgl I know it's a bad idea to do so. As I already mentioned in my question, the duration/accuracy of the delay is insignificant here. It just has to be more than a certain minimum value. Moreover, the environment I am working is a single processor, no multi-processing, bare-metal firmware. – LoneWolf May 13 '15 at 07:45
  • Is the system call `nanosleep` or something like it available? Is a time delay actually required, or is there some state transition that could trigger code execution, etc.? – rickhg12hs May 13 '15 at 07:46
  • @user2357112 As you correctly mention, it's a vague approach to take. But in my case I am concerned more with this optimization part rather than the accuracy of the delay. It really is insignificant in my scenario. – LoneWolf May 13 '15 at 07:48
  • @LoneWolf Even in thus case, your controller library might provide a clean solution, like `_delay_us()` on AVRs. This kind of functions is often written in assembler and are quite accurate if you use them. Even if you don't have a need for being this accurate, it is the cleanest solution as any other solution can be optimized out, as you see. – glglgl May 13 '15 at 07:54
  • 1
    See possible duplicate [Loop with a zero execution time](http://stackoverflow.com/q/26771692/1708801) and related [Do temp variables slow down my program?](http://stackoverflow.com/q/26949569/1708801) – Shafik Yaghmour May 13 '15 at 09:28
  • Possible duplicate of [How to prevent GCC from optimizing out a busy wait loop?](http://stackoverflow.com/questions/7083482/how-to-prevent-gcc-from-optimizing-out-a-busy-wait-loop) – Ciro Santilli OurBigBook.com Oct 15 '16 at 15:02

1 Answers1

20

This is all about observable behavior. The only observable behavior of your loop is that i is 50000000U after the loop. The compiler is allowed to optimize it and replace it by i = 50000000U;. This i assignment will also be optimized out because the value of i have no observable consequences.

The volatile keyword tells the compiler that writing to and reading from i have an observable behavior, thus preventing it from optimizing.

The compiler will also not optimize calls to function where it doesn't have access to the code. Theoretically, if a compiler had access to the whole OS code, it could optimize everything but the volatile variables, which are often put on hardware IO operations.

These optimization rules all conform to what is written in the C standard (cf. comments for references).

Also, if you want a delay, use a specialized function (ex: OS API), they are reliable and don't consume CPU, unlike a spin-delay like yours.

ElderBug
  • 5,926
  • 16
  • 25
  • 5
    I think this is covered in N1570 §5.1.2.3, especially in paragraph 4: *In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).* – user694733 May 13 '15 at 07:41
  • 1
    Thanks for the insights ElderBug. This is what I was actually looking for; some caveat in the C standard against this. As far as using the OS APIs for delay, I am working on a single processor, bare metal firmware and the accuracy/preciseness of the delay is insignificant here. I just need to make sure it is more than a certain threshold value. – LoneWolf May 13 '15 at 07:52
  • @user694733 Thanks for pointing it out. It surely reduced the hassle. :) – LoneWolf May 13 '15 at 07:54
  • 5
    @LoneWolf For bare-metal firmware, you often have access to a hardware timer. One good technique for delay is to store the timer value, then loop until the value exceed some point (calculated with the timer frequency). You just need a running timer for that (or run one for this), and it provides a reliable minimum delay. – ElderBug May 13 '15 at 07:58
  • 5
    @LoneWolf Is is an even worse idea to write delay loops on bare metal firmware than on some hosted desktop application, because on a bare metal system you have direct access high-accuracy hardware timers. So use the on-chip timers of your MCU, they are there for a reason. – Lundin May 13 '15 at 09:12
  • @Lundin I totally agree with you on this. This is a pretty vague approach to implement delays. But as I said, I am more concerned about the behavior of the compiler when encountering such a condition. Maybe I should have phrased my question as being specific to the actual problem. I just went with this (read casual) approach to test the proper initialization of a device register. – LoneWolf May 13 '15 at 13:59