GCC PowerPC avoiding .rodata section for floats

Question

I'm writing C code and compile it for the PowerPC architecture. That said C code contains floating point variable constants which I want to be placed in the .text section instead of .rodata so the function code is self-contained.

The problem with this is that in PowerPC, the only way to move a floating point value into a floating point register is by loading it from memory. It is an instruction set restriction.

To convince GCC to help me, I tried declaring the floats as static const. No difference. Using pointers, same results. Using __attribute__((section(".text"))) for the function, same results and for each floating point variable individually:

error: myFloatConstant causes a section type conflict with myFunction

I also tried disabling optimizations via #pragma GCC push_options #pragma GCC optimize("O0") and #pragma GCC pop_options. Plus pretending I have an unsigned int worked:

unsigned int *myFloatConstant = (unsigned int *) (0x11000018);
*myFloatConstant = 0x4C000000;

Using the float:

float theActualFloat = *(float *) myFloatConstant;

I still would like to keep -O3 but it again uses .rodata so a potential answer would include which optimization flag causes the floats to be placed in .rodata since starting from -O1 this is happening?

Best case scenario would be that I can use floats "normally" in the code plus maximum optimizations and they never get placed in .rodata at all.

What I imagine GCC to possibly do is placing the float constant in-between the code by mixing data and code, loading from that place into a floating point register and continue. This is possible to write manually I believe but how to make GCC do that? Forcing the attribute per variable causes the error from above but technically this should be feasible.

The POWER ABI is a bit funny; see `man gcc` and the POWER `-msdata` option in particular. On the GCC dev mailing list, someone mentioned that adding `-G 0` to gcc options "fixes" this; could you try that and report whether that makes gcc do what you prefer? — Nominal Animal, Aug 19 '17 at 11:21
@fuz I guess to optimize cache usage, reduce TLB misses, cache faults etc? — Antti Haapala -- Слава Україні, Aug 19 '17 at 11:43
@fuz Maybe the code is used in a non-standard way (e.g. code injected into microcontroller RAM and executed there) which requires to be absolutely position-independent. — Martin Rosenau, Aug 19 '17 at 11:51
@AnttiHaapala: Modern CPUs (including x86 and PowerPC) have split L1 caches, and separate first-level TLBs, for instructions and data. Loading data from the same cache line that's currently executing can hit in L2, though. (Unless your L2 is exclusive with L1D, like on AMD Bulldozer-family). The dTLB can still miss, too. It's common on x86 for the L2 TLB to hold evicted entries from iTLB and dTLB, but the entry for the current page will be in the iTLB, and there's no reason to expect it to be in the L2 TLB, so the dTLB may well trigger a page walk. — Peter Cordes, Aug 21 '17 at 00:14
Wasting L1I cache footprint on data, and wasting L1D cache footprint on code, is usually not a good idea. Other than that, it's not worse than separate data, but it's probably not much better. However, having your data in the actual `.text` section *near* your function isn't inherently bad, and doesn't waste anything if they're in separate cache lines. If it's in the cache-line after, maybe L2 prefetch will even bring in the data before it's demand-loaded. — Peter Cordes, Aug 21 '17 at 00:16
(*writing* near executing code is slow on some CPUs. This may only affect x86, not PPC, because [x86 has I-cache coherent with data cache](https://stackoverflow.com/a/18388700/224132). But read-only data won't cause self-modifying-code pipeline flushes or other nasty effects.) — Peter Cordes, Aug 21 '17 at 00:25

score 3 · Accepted Answer · answered Aug 19 '17 at 11:50

3

Using GCC 7.1.0 powerpc-eabi (cross compiler under Linux) the following code worked for me:

float test(void)
{
    int x;
    volatile float y;
    float theActualFloat;

    *(float *)&x = 1.2345f;
    *(int *)&y = x;
    theActualFloat = y;

    return theActualFloat;
}

Resulting assembly code:

test:
    stwu 1,-24(1)
    lis 9,0x3f9e
    ori 9,9,0x419
    stw 9,8(1)
    lfs 1,8(1)
    addi 1,1,24
    blr

Explaination:

In the line *(float *)&x = value you write to an integer which will be optimized by the compiler. The compiler will perform an integer operation which does not access floating point values in .rodata.

The line *(int *)&y = x is a pure integer operation anyway.

The line theActualFloat = y cannot be optimized due to the volatile so the compiler has to write the integer to the variable on the stack and it has to read the result from the variable.

answered Aug 19 '17 at 11:50

Martin Rosenau

17,897
3
19
38

This is fine but what about a generic function/macro to return a float instead of duplicating it entirely and changing the `*(float *)&x = 1.2345f` line? – BullyWiiPlaza Aug 20 '17 at 10:14
Type-punning with pointer-casts violates strict aliasing. Use a `union` if possible (which is guaranteed to work in C99 and later, as well as GNU89 and GNU C++). I guess you could make the `union` volatile` to force the compiler to store to it with data from immediates, instead of optimizing away the compile-time constant. It would be nice if there was a way to get that without forcing the compiler to redo it every time after inlining this function into a loop, though. I guess that's not really relevant since the OP wants a stand-alone function. – Peter Cordes Aug 21 '17 at 00:28
@BullyWiiPlaza and Martin: IIRC, PowerPC really doesn't like store/reload, especially between FPU and integer. I tried to google up something about this, but mostly found stuff like http://alex-simon.blogspot.ca/2010/04/load-hit-store.html which has some useful C++-programming suggestions but seems pretty fuzzy on the microarchitectural details. (Apparently at least some PowerPC uarches can do store-hit-load forwarding like x86 does, for integer store/reload, when the load isn't wider than the previous store. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71310) – Peter Cordes Aug 21 '17 at 00:45

score 0 · Answer 2 · answered Aug 20 '17 at 22:17

0

I found another solution which avoids stack frame creation and .rodata usage but requires an absolute memory address to store the float in:

static inline volatile float *getFloatPointer(int address, float value) {
    float *pointer = (float *) address;
    *pointer = value;

    return pointer;
}

It is used like this:

volatile float *myFloat = getFloatPointer(0x12345678, 30.f);
printf("%f", *myFloat);

It is important to not make a local float variable, only volatile pointers so it won't use .rodata again.

answered Aug 20 '17 at 22:17

BullyWiiPlaza

17,329
10
113
185

You definitely want to do `float foo = *myFloat;` outside of a loop, because the compiler has to actually emit a load instruction every time you use `*myFloat`, because it's a pointer-to-volatile. You just need to stop the compiler from doing constant-propagation all the way to a compile-time constant `float` which it will put in `.rodata`. You don't need or want to stop if from keeping the constant in a register for your whole function. – Peter Cordes Aug 21 '17 at 00:52

GCC PowerPC avoiding .rodata section for floats

2 Answers2