3

In our project, we are using ticlang compiler, i.e. a flavor of clang from TI. Optimization is set to level -Os.

In the code we have variables that have a struct type and are only used within a C file and hence are defined as static struct_type_xy variable;

The compiler performs some optimization where the members of such a struct are not kept in sequence in one block of memory but are re-ordered and even split.

This means that while debugging such variables cannot be displayed properly. Of course, I could define them as volatile but that would also prevent optimizing multiple accesses to same members which I don't want to happen.

Therefore I want to prevent this kind of optimization.

What is the name of such an optimization and how can I disable it in clang?

I don't have a MCVE yet but I can provide a few details:

typedef struct
{
  Command_t      Command; // this is an enum type
  int            Par_1;   // System uses 32 bit integers.
  int            Par_2;
  int            Par_3;
  int            Par_4;
  size_t         Num_Tok;
} Cmd_t;

static Cmd_t     Cmd;

The map file then contains:

                  20000540    00000004     Cmd.o (.bss.Cmd.1)
                  20000544    00000004     Cmd.o (.bss.Cmd.2)
                  20000548    00000004     Cmd.o (.bss.Cmd.5)
                  2000054c    00000004     HAL_*
                  ...
                  2000057b    00000001     XY_*
                  2000057c    00000001     Cmd.o (.bss.Cmd.0)

The parts of Cmd are split accross the memory and some are even removed. (I used a bulid configuration where the missing 2 members are not used but the struct definition is identical for all configurations)

If I remove static this changes to

                  200004c4    00000018     (.common:Cmd)
Gerhardh
  • 11,688
  • 4
  • 17
  • 39
  • 5
    I doubt that a (mostly) conforming compiler like Clang re-orders or even splits a structure in the way you say. Would you mind to [edit] your question and provide a [mre], please? – the busybee Jul 26 '22 at 11:58
  • Rather than *not being kept in sequence*, I think it more likely that you're seeing [structure padding](https://stackoverflow.com/q/4306186/10871073). Show us some code, and we can confirm or deny ... – Adrian Mole Jul 26 '22 at 12:04
  • @thebusybee, the standard just says that *addresses* of members are in growing order. So if no address is used then the conforming compiler could still reorder the members. I might be wrong. It may be worth making language-lawyer question about it. – tstanisl Jul 26 '22 at 12:10
  • 1
    If the objects in question have internal linkage (as the OP claims) and no pointer to them is ever computed then the compiler could conceivably determine that it is safe to perform the kind of optimization described, and it might make sense to do so when optimizing for size. The effect would probably be visible via a debugger. – John Bollinger Jul 26 '22 at 12:11
  • 1
    The usual way to work around debugging difficulties produced by optimizations is to build a version *without* optimization and debug that. – John Bollinger Jul 26 '22 at 12:15
  • I don't really doubt that this is legal as there is no difference visible in output i.e. the code behaves as if it was not optimized that way. – Gerhardh Jul 26 '22 at 12:27
  • @tstanisl Ok, the OP proves the claim, and the example even puts the first element after the last element. =-O – the busybee Jul 26 '22 at 12:50
  • 1
    How about you add code which takes address of the struct, and maybe just puts that to a volatile variable. If that is not enough, try memcpy on the struct? Ideally this code is run just once on startup, or something. A songle static volatile pointer to the struct might also be enough to prevent this optimization. – hyde Jul 26 '22 at 13:34
  • @hyde Thanks for the suggestion. I know how to circumvent this with some hacks. I am more interested in a cleaner solution that affects all such variables at once. If that is some extra compiler optimization, I would assume there is some flag for it. – Gerhardh Jul 26 '22 at 13:42
  • I think clang can show the optimizations it is using (don't remember which command line switch). Comparing that list to `-Og` optimization level (which presumably blocks this optimization, at least I'd imagine so) might help you find this. – hyde Jul 26 '22 at 13:50

1 Answers1

2

Clang is apparently scalarizing the static struct, breaking it up into separate members, since the address is never taken or used, and doesn't escape the compilation unit. This lets it optimize away unused members.

LLVM has a "Scalar Replacement of Aggregates" (sroa) optimization pass. https://llvm.org/docs/Passes.html#sroa-scalar-replacement-of-aggregates (The alloca mentioned in that doc is an LLVM IR instruction, not the C alloca() function. Also, google found a random copy of the LLVM source that implements this while I was trying to find the right search terms.)

clang -O3 -Rpass=sroa might print a "remark" for each struct it optimizes, if that pass supports optimization reports.

According to Clang optimization levels, -sroa is enabled at -O1 and higher. But -sroa isn't a clang option, nor it an LLVM option for clang -mllvm -sroa. In 2011, someone asked about adding a command-line option to disable an arbitrary optimization pass; IDK if any feature ever got added.


clang -cc1 -mllvm -help-list-hidden does show some interesting option names, like --stop-before=<pass-name> and --start-after=<pass-name>, and there's a --sroa-strict-inbounds.

clang -mllvm --sroa-strict-inbounds -O1 does actually compile, but I don't know what it does.

clang -mllvm --stop-before=sroa -O3 hello.c doesn't work on my system with clang 13. Or with --stop-before=-sroa. I get error in backend: "sroa" pass is not registered.

So I don't know how to actually disable this optimization pass, but that's almost certainly the one responsible. This is as far as I've gotten.

It's enabled at -O1, so it's not viable to use a lower optimization level and enabling the other optimization flags that normally implies. -O0 is special, and marks everything as optnone, to make sure code-gen is suitably literal, storing/reloading everything between C statements.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847