2

I use gcc to compile a simple test code for ARM Cortex-M4, and it will optimize the usgae of the global variables which confused me. What are the rules that gcc optimizing the usage of global variables?

GCC compiler: gcc-arm-none-eabi-8-2019-q3-update/bin/arm-none-eabi-gcc

Optimization level: -Os

My test code:

The following code is in "foo.c", and the function foo1() and foo2() ard called in task A, the function global_cnt_add() is called in task B.

int g_global_cnt = 0;

void dummy_func(void);

void global_cnt_add(void)
{
    g_global_cnt++;
}

int foo1(void)
{
    while (g_global_cnt == 0) {
        // do nothing
    }

    return 0;
}

int foo2(void)
{
    while (g_global_cnt == 0) {
        dummy_func();
    }

    return 0;
}

The function dummy_func() is implemented in bar.c as following:

void dummy_func(void)
{
    // do nothing
}

The assembly code of function foo1() is shown below:

int foo1(void)
{
    while (g_global_cnt == 0) {
  201218:   4b02        ldr r3, [pc, #8]    ; (201224 <foo1+0xc>)
  20121a:   681b        ldr r3, [r3, #0]
  20121c:   b903        cbnz    r3, 201220 <foo1+0x8>
  20121e:   e7fe        b.n 20121e <foo1+0x6>
        // do nothing
    }

    return 0;
}
  201220:   2000        movs    r0, #0
  201222:   4770        bx  lr
  201224:   00204290    .word   0x00204290

The assembly code of function foo2() is shown below:

int foo2(void)
{
  201228:   b510        push    {r4, lr}
    while (g_global_cnt == 0) {
  20122a:   4c04        ldr r4, [pc, #16]   ; (20123c <foo2+0x14>)
  20122c:   6823        ldr r3, [r4, #0]
  20122e:   b10b        cbz r3, 201234 <foo2+0xc>
        dummy_func();
    }

    return 0;
}
  201230:   2000        movs    r0, #0
  201232:   bd10        pop {r4, pc}
        dummy_func();
  201234:   f1ff fcb8   bl  400ba8 <dummy_func>
  201238:   e7f8        b.n 20122c <foo2+0x4>
  20123a:   bf00        nop
  20123c:   00204290    .word   0x00204290

In the assembly code of function foo1(), the global variable "g_global_cnt" is loaded only once, and the while loop will never be broken. The compiler optimize the usage of "g_global_cnt", and I know I can add volatile to avoid this optimization.

In the assembly code of function foo2(), the global variable "g_global_cnt" is loaded and checked in each while loop, the while loop can be broken.

What are the gcc optimization rules make the difference?

artless noise
  • 21,212
  • 6
  • 68
  • 105
Ivan
  • 191
  • 5
  • Optimization is done on "as-if" basis. That means that a compiler is allowed to do whatever it wants as long a the resulting program behavior stays the same. If a variable isn't protected by a mutex (or similar) the compiler is allowed to assume that the variable is only used by a single thread. In other words... when a variable is shared by multiple threads, it is your task to use a mechanism, e.g. a mutex, to make sure the compiler knows that special rules apply for that variable. – Support Ukraine Oct 27 '22 at 09:45
  • 1
    @SupportUkraine this question has nothing to do with mutexes. The compiler doesn't know *statically* if a variable is protected with a mutex. This is just optimization. – Fra93 Oct 27 '22 at 09:49

1 Answers1

3

In order to understand this behaviour, you have to think about side effects and sequence point ref.

For the compiler a side effect is a result of an operator, expression, statement, or function that persists even after the operator, expression, statement, or function has finished being evaluated.

While *A sequence point defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed. *

The main rule of a sequence point is that no variable will be accessed more than once between points for any purpose other than to calculate a change in its value

Citing the C standard:

In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object).

In your code

int foo1(void)
{
    while (g_global_cnt == 0) {
        // do nothing
    }

    return 0;
}

After reading the g_global_cnt there are no more side effects that might influence the value of the variable. The compiler can't know that it is modified outside the scope of the function, hence it thinks that you can read it only once, and that's because there are no more sequence points in the functions scope.

The way to tell the compiler that each read has side effects is to mark the variable with the identifier volatile.

With int g_global_cnt = 0;:

        adrp    x0, g_global_cnt
        add     x0, x0, :lo12:g_global_cnt
        ldr     w0, [x0]
        cmp     w0, 0
        beq     .L3
        mov     w0, 0
        ret

With volatile int g_global_cnt = 0;:

        adrp    x0, g_global_cnt
        add     x0, x0, :lo12:g_global_cnt
        ldr     w0, [x0]
        cmp     w0, 0
        cset    w0, eq
        and     w0, w0, 255
        cmp     w0, 0
        bne     .L3
        mov     w0, 0
        ret
Fra93
  • 1,992
  • 1
  • 9
  • 18
  • https://stackoverflow.com/questions/2484980/why-is-volatile-not-considered-useful-in-multithreaded-c-or-c-programming – Support Ukraine Oct 27 '22 at 09:53
  • @SupportUkraine We don't need to beat that dead old horse yet again. Missing volatile for variables shared by several threads/processes/ISRs is a well-known nasty bug and it has nothing to do with race conditions what-so-ever. However, the presence of this bug might also be an indication that the variable needs to be protected from race conditions as well. Which is a separate matter. – Lundin Oct 27 '22 at 09:54
  • @SupportUkraine All explained from a microcontroller perspective here: https://electronics.stackexchange.com/questions/409545/using-volatile-in-embedded-c-development/409570#409570. PC programmers tend to be oblivious to the well-known missing-volatile-causing-optimization-problems bug, and embedded programmers tend to be oblivious to similarly well-known race condition bugs caused by lack of protection of shared variables. Two separate issues. – Lundin Oct 27 '22 at 09:58
  • @SupportUkraine I can also point you to https://www.kernel.org/doc/html/latest/process/volatile-considered-harmful.html which explains why we don't need `volatile` in kernel programming. This is however not the point of the question. – Fra93 Oct 27 '22 at 10:00
  • @Lundin indeed, from my understanding, after specifying `volatile` you might have all other sorts of problems at runtime like race conditions etc.. However we need to solve a problem at a time :) – Fra93 Oct 27 '22 at 10:02