Strange pointer arithmetic

Question

I came across too strange behaviour of pointer arithmetic. I am developing a program to develop SD card from LPC2148 using ARM GNU toolchain (on Linux). My SD card a sector contains data (in hex) like (checked from linux "xxd" command): fe 2a 01 34 21 45 aa 35 90 75 52 78 While printing individual byte, it is printing perfectly.

char *ch = buffer; /* char buffer[512]; */
for(i=0; i<12; i++)
    debug("%x ", *ch++);

Here debug function sending output on UART. However pointer arithmetic specially adding a number which is not multiple of 4 giving too strange results. uint32_t *p; // uint32_t is typedef to unsigned long.

p = (uint32_t*)((char*)buffer + 0);
debug("%x ", *p);   // prints 34012afe   // correct

p = (uint32_t*)((char*)buffer + 4);
debug("%x ", *p);   // prints 35aa4521  // correct

p = (uint32_t*)((char*)buffer + 2);
debug("%x ", *p);   // prints 0134fe2a  // TOO STRANGE??

Am I choosing any wrong compiler option? Pls help. I tried optimization options -0 and -s; but no change.

I could think of little/big endian, but here i am getting unexpected data (of previous bytes) and no order reversing.

possible duplicate of [Casting pointers on embedded devices](http://stackoverflow.com/questions/14032434/casting-pointers-on-embedded-devices) — artless noise, Jan 31 '14 at 22:29
Thank you for quick reply redirecting me to duplicate questions. My ARM architecture is ARM7TDMI and as all pointed it does not support unaligned access. But probably for the same reason even structure member access is also giving incorrect results. For this reason I am not able to use any ready library like fat on my SD card. Any hints/solutions will be a great help. — nileshg, Feb 01 '14 at 01:18
@user3258584 try to disable rotating feature and force cpu to create an abort and handle unaligned access there. Did you try `-mno-unaligned-access` while compiling? http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html — auselen, Feb 01 '14 at 19:42
Thanks for suggestion. I'll definitely try this solution. And update results here. — nileshg, Feb 02 '14 at 01:39

barak manos · Accepted Answer · 2014-02-02T06:06:30.287

Your CPU architecture must support unaligned load and store operations.

To the best of my knowledge, it doesn't (and I've been using STM32, which is an ARM-based cortex).

If you try to read a uint32_t value from an address which is not divisible by the size of uint32_t (i.e. not divisible by 4), then in the "good" case you will just get the wrong output.

I'm not sure what's the address of your buffer, but at least one of the three uint32_t read attempts that you describe in your question, requires the processor to perform an unaligned load operation.

On STM32, you would get a memory-access violation (resulting in a hard-fault exception).

The data-sheet should provide a description of your processor's expected behavior.

UPDATE:

Even if your processor does support unaligned load and store operations, you should try to avoid using them, as it might affect the overall running time (in comparison with "normal" load and store operations).

So in either case, you should make sure that whenever you perform a memory access (read or write) operation of size N, the target address is divisible by N. For example:

uint08_t x = *(uint08_t*)y; // 'y' must point to a memory address divisible by 1
uint16_t x = *(uint16_t*)y; // 'y' must point to a memory address divisible by 2
uint32_t x = *(uint32_t*)y; // 'y' must point to a memory address divisible by 4
uint64_t x = *(uint64_t*)y; // 'y' must point to a memory address divisible by 8

In order to ensure this with your data structures, always define them so that every field x is located at an offset which is divisible by sizeof(x). For example:

struct
{
    uint16_t a; // offset 0, divisible by sizeof(uint16_t), which is 2
    uint08_t b; // offset 2, divisible by sizeof(uint08_t), which is 1
    uint08_t a; // offset 3, divisible by sizeof(uint08_t), which is 1
    uint32_t c; // offset 4, divisible by sizeof(uint32_t), which is 4
    uint64_t d; // offset 8, divisible by sizeof(uint64_t), which is 8
}

Please note, that this does not guarantee that your data-structure is "safe", and you still have to make sure that every myStruct_t* variable that you are using, is pointing to a memory address divisible by the size of the largest field (in the example above, 8).

SUMMARY:

There are two basic rules that you need to follow:

Every instance of your structure must be located at a memory address which is divisible by the size of the largest field in the structure.
Each field in your structure must be located at an offset (within the structure) which is divisible by the size of that field itself.

Exceptions:

Rule #1 may be violated if the CPU architecture supports unaligned load and store operations. Nevertheless, such operations are usually less efficient (requiring the compiler to add NOPs "in between"). Ideally, one should strive to follow rule #1 even if the compiler does support unaligned operations, and let the compiler know that the data is well aligned (using a dedicated #pragma), in order to allow the compiler to use aligned operations where possible.
Rule #2 may be violated if the compiler automatically generates the required padding. This, of course, changes the size of each instance of the structure. It is advisable to always use explicit padding (instead of relying on the current compiler, which may be replaced at some later point in time).

In fact, Cortex-M3 [does support unaligned accesses](http://www.keil.com/forum/13940/), though you can enable generation of faults for such accesses. The OP has an older, ARM7 based chip. — Igor Skochinsky, Jan 31 '14 at 20:12
Thanks for quick reply. How can I handle this, specially when accessing a structure member? — nileshg, Feb 01 '14 at 02:20
Thanks. I followed this way and it is working well. However this approach is not usable for predefined struct like fat boot record on disk. But I managed to work with them. — nileshg, Feb 02 '14 at 01:36
You're welcome; as I mention at the end of the answer, you still have to make sure that every `myStruct_t*` variable that you are using, is pointing to a memory address divisible by the size of the largest field. So if you read the FAT from disk to RAM, then you should make sure that your FAT structure in RAM resides at a memory address which is divisible by the size of the largest field in it. — barak manos, Feb 02 '14 at 05:28
@user3258584 The process to parse something like a **FAT** is called **serialization**. Generally, you use `char *` and extract each byte and do **shifts** and **OR** to accumulate the values. The more recent ARMs have assembler to aid; if this is an application hot path, you can use optimizations. You compiler may support `__attribute((packed));` or the like. — artless noise, Feb 03 '14 at 15:21

score 2 · Answer 2 · answered Jan 31 '14 at 20:03

LDR is the ARM instruction to load data. You have lied to the compiler that the pointer is a 32bit value. It is not aligned properly. You pay the price. Here is the LDR documentation,

If the address is not word-aligned, the loaded value is rotated right by 8 times the value of bits [1:0].

See: 4.2.1. LDR and STR, words and unsigned bytes, especially the section Address alignment for word transfers.

Basically your code is like,

  p = (uint32_t*)((char*)buffer + 0);
  p = (p>>16)|(p<<16);
  debug("%x ", *p);   // prints 0134fe2a

but has encoded to one instruction on the ARM. This behavior is dependent on the ARM CPU type and possibly co-processor values. It is also highly non-portable code.

score 0 · Answer 3 · answered Oct 10 '16 at 03:05

It's called "undefined behavior". Your code is casting a value which is not a valid unsigned long * into an unsigned long *. The semantics of that operation are undefined behavior, which means pretty much anything can happen*.

In this case, the reason two of your examples behaved as you expected is because you got lucky and buffer happened to be word-aligned. Your third example was not as lucky (if it was, the other two would not have been), so you ended up with a pointer with extra garbage in the 2 least significant bits. Depending on the version of ARM you are using, that could result in an unaligned read (which it appears is what you were hoping for), or it could result in an aligned read (using the most significant 30 bits) and a rotation (word rotated by the number of bytes indicated in the least significant 2 bits). It looks pretty clear that the later is what happened in your 3rd example.

Anyway, technically, all 3 of your example outputs are correct. It would also be correct for the program to crash on all 3 of them.

Basically, don't do that.

A safer alternative is to write the bytes into a uint32_t. Something like:

uint32_t w;
memcpy(&w, buffer, 4);
debug("%x ", w);
memcpy(&w, buffer+4, 4);
debug("%x ", w);
memcpy(&w, buffer+2, 4);
debug("%x ", w);

Of course, that's still assuming sizeof(uint32_t) == 4 && CHAR_BITS == 8, but that's a much safer assumption. (Ie, it should work on pretty much any machine with 8 bit bytes.)

Strange pointer arithmetic

3 Answers3

UPDATE:

SUMMARY: