0

There maybe a very simple solution to this problem but it has been bothering me for a while, so I have to ask.

In our embedded projects, it seems common to have simple get/set functions to many variables in separate C-files. Then, those variables are being called from many other C-files. When I look the assembly listing, those function calls are never replaced with move instructions. Faster way would be to just declare monitored variables as global variables to avoid unnecessary function calls.

Let's say you have a file.c which has variables that need to be monitored in another C-file main.c. For example, debugging variables, hardware registers, adc-values, etc. Is there a compiler optimization that replaces simple get/set functions with assembly move instructions thus avoiding unnecessary overhead caused by function calls?

file.h

#ifndef FILE_H
#define FILE_H

#include <stdint.h>

int32_t get_signal(void);
void set_signal(int32_t x);

#endif

file.c

#include "file.h"
#include <stdint.h>

static volatile int32_t *signal = SOME_HARDWARE_ADDRESS;

int32_t get_signal(void)
{
  return *signal;
}

void set_signal(int32_t x)
{
   *signal = x;
}

main.c

#include "file.h"
#include <stdio.h>

int main(int argc, char *args[])
{
   // Do something with the variable
   for (int i = 0; i < 10; i++)
   {
     printf("signal = %d\n", get_signal());
   }
   
   return 0;
}

If I compile the above code with gcc -Wall -save-temps main.c file.c -o main.exe, it gives the following assembly listing for main.c. You can always see the call get_signal even if you compile with -O3 flag which seems silly as we are only reading memory address. Why bother calling such simple function?

Same explanation applies for the simple set function. It is always called even though we would be only writing to one memory location in the function and doing nothing else.

main.s

main:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $48, %rsp
    .seh_stackalloc 48
    .seh_endprologue
    movl    %ecx, 16(%rbp)
    movq    %rdx, 24(%rbp)
    call    __main
    movl    $0, -4(%rbp)
    jmp .L4
.L5:
    call    get_signal
    movl    %eax, %edx
    leaq    .LC0(%rip), %rcx
    call    printf
    addl    $1, -4(%rbp)
.L4:
    cmpl    $9, -4(%rbp)
    jle .L5
    movl    $0, %eax
    addq    $48, %rsp
    popq    %rbp
    ret

UPDATED 2023-02-13

Question was closed with several links to inline and Link-time Optimization-related answers. I don't think the same question has been answered before or at least the solution is not obvious for my get_function. What is there to inline if a function just returns a value and does nothing else?

Anyways, it seems, as suggested, that one solution to fix this problem is to add compiler flags -O2 -flto which correctly replaces assembly instruction call get_signal with move instruction with the following partial output:

main:
    subq    $40, %rsp
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
    movl    tmp.0(%rip), %edx
    movl    $10, %eax
    .p2align 4,,10
    .p2align 3
.L4:
    movl    signal(%rip), %ecx
    addl    %ecx, %edx
    subl    $1, %eax
    jne .L4
    leaq    .LC0(%rip), %rcx
    movl    %edx, tmp.0(%rip)
    call    printf.constprop.0
    xorl    %eax, %eax
    addq    $40, %rsp
    ret
    .seh_endproc

Thank you.

faba
  • 1
  • 1
  • 2
    you must use [LTO](https://gcc.gnu.org/wiki/LinkTimeOptimization) because the functions are in separate compilation units. There's no way for the compiler to know what are in other compilation units to optimize – phuclv Feb 13 '23 at 01:56
  • 1
    Another way might be to remove `static` from `signal` in the `.c`. Move the function definitions to the `.h` (replacing the prototypes) and add (e.g.) `static inline __attribute__((always inline))` to each function. The `.h` would need (e.g.) `extern volatile int32_t *signal;` Then, each call will be inlined. – Craig Estey Feb 13 '23 at 02:14
  • Is there a caller which, after calling `get_signal()`, perhaps calls [`main_screen(TURN_ON)`](https://en.wikiquote.org/wiki/Zero_Wing)? :P Sorry, yes, you either need LTO (`gcc -O2 -flto` when you compile *and* when you link), or make the full definitions visible to callees at compile time, e.g. in the header with `static inline` or plain `inline`; in the latter case you need a stand-alone definition in exactly one `.c` in case the compiler chooses not to inline at every call-site. – Peter Cordes Feb 13 '23 at 02:57
  • 1
    Also if you care about the asm not sucking, enable optimization as well, like `-Og` at least. If you need it to inline even at `-O0`, `__attribute__((always_inline))` (as well as making the definition visible so that's possible.) – Peter Cordes Feb 13 '23 at 03:02
  • *What is there to inline if a function just returns a value and does nothing else?* - The fact that it's that simple doesn't open up any new possibilities for getting it to inline. That's why I closed it as a duplicate. The compiler can't inline it if it can't see the definition, but if you give it a way to inline it will. – Peter Cordes Feb 13 '23 at 05:09
  • Unless you consider turning it into a macro instead of a function declaration, which would still give you the opportunity to turn it into a function later if you do need to hook each access to something other than `#define get_signal() (*signal)`. But inline functions are simple and easy, especially if you just `static inline int32_t get_signal(){ return *get_signal; }` right in the header next to the variable declaration, so it seemed to me that the best solution was one of the existing duplicates. – Peter Cordes Feb 13 '23 at 05:11
  • TL:DR: I closed it as a duplicate because I don't think this case is special; not only does the general case work, there isn't anything better you can do. Not that I know of, anyway. I can of course reopen if someone else knows of a neat trick. – Peter Cordes Feb 13 '23 at 05:15

0 Answers0