3

I am working with the ARM compiler and have a HW peripheral (having direct memory access) which is requiring a specific alignment for the memory buffers passed to it (32-byte alignment). This is not a problem when the buffers are global/static and can be defined using the aligned attribute the compiler is supporting. The problem is arising whenever there is a need to pass a buffer defined in some function locally, i.e. having an automatic storage class. I tried to do something similar to following:

typedef struct  __attribute__((aligned(32)))
{
    char bytes[32];
} aligned_t;

_Static_assert(sizeof(aligned_t)==32, "Bad size");

void foo(void)
{
    aligned_t alignedArray[NEEDED_SIZE/sizeof(aligned_t)];
    //.... use alignedArray
}

and this was happily compiled and working on x86 compiler. But not in armcc, which is complaining:

Warning: #1041-D: alignment for an auto object may not be larger than 8

So this approach does not work. There is another one, which I consider ugly:

void foo(void)
{
    char unalignedBuffer[NEEDED_SIZE + 32 - 1];
    char pAlignedBuffer = ALIGN_UP_32(unalignedBuffer);
    //.... use pAlignedBuffer
}

while the ALIGN_UP_32 is a macro to return the first aligned address within unalignedBuffer (implementation details are not important here I guess).

As I said, I don't like this approach and wondering if there is a more elegant way to achieve the same?

Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
Eugene Sh.
  • 17,802
  • 8
  • 40
  • 61
  • Is dynamic allocation an option? – dbush Nov 29 '18 at 15:42
  • @dbush Not really. I guess I should have mentioned this. – Eugene Sh. Nov 29 '18 at 15:43
  • Can you explain why do you need automatic storage class in the first place (do you need reentrancy)? – user694733 Nov 29 '18 at 15:55
  • @user694733 I am trying to provide a least constrained way of usage of the specific HAL I am developing. In the end (some of) the buffers are to be provided by the caller, and I am trying to come up with the common convention, which would be either transparent (ideally) or well documented. – Eugene Sh. Nov 29 '18 at 16:11
  • You could use the static buffer for the hardware access, and copy to/from unaligned automatic buffer as necessary – M.M Nov 30 '18 at 08:34
  • @M.M Yeah, I used this approach for some places, but apparently for large amounts of data it will be impractical and impact performance (of course you could argue that stack allocation is impractical for large buffers as well, but the goal here to have some generic way to work with any storage class buffer) – Eugene Sh. Nov 30 '18 at 14:21

2 Answers2

2

I am working with the ARM compiler

Have you also tried a recent GCC (perhaps configured as a cross-compiler), e.g. GCC 8 in november 2018?

The stack pointer (probably) is not guaranteed by the ARM ABI to be aligned to 32 bytes.

So any automatic variable is not aligned as much as you want.

You could avoid them (and systematically use suitably aligned heap memory zone). Or you could allocate more than what is needed and do pointer arithmetic on it.

I feel that your char* pAlignedBuffer = ALIGN_UP_32(unalignedBuffer); is a good approach, and I would believe that an optimizing compiler would generate quite efficient code.

I don't like this approach and wondering if there is a more elegant way to achieve the same?

I believe your approach is good, and any other way would be equivalent.

PS. Another approach might be to patch your GCC compiler (perhaps with a plugin) to change the default alignment of the stack pointer (hence effectively changing your ABI and calling conventions). That would take you weeks (or months) of efforts.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
  • I guess I'll take it as a second opinion :) – Eugene Sh. Nov 29 '18 at 16:25
  • The stack pointer is indeed not required to be 32-byte aligned by the ARM ABI. It's required to be 8-byte aligned across function boundaries in different translation units, but there's no requirement for any specific alignment in other places, and in general auto-class variables may have any alignment. – cooperised Nov 29 '18 at 19:55
1

Your two options look like the simplest. However (and just guessing, I have not thought a lot about my own answer), another option could be creating another stack. When the function which contains your buffer is executed, the context is switched (well, just the SP - in supervisor mode - ) and now the SP points to the second stack. This stack is allocated in a 32bit aligned section and it will only contain 32bit aligned objects, so when a local 32bit aligned variable is created, it will be allocated in a 32bit aligned bunch of memory which will be released once the variable is out of scope. Once the function is executed, the SP is switched back to the main stack. The execution of the function has to be considered as a critical region in order to avoid push/pop in the wrong stack. I don't think that this will yield a stack overflow, but as I said I digress, just in case it helps...

Jose
  • 3,306
  • 1
  • 17
  • 22