3


Scenario is that, i wanna read 4 bytes of data from a given pointer which is of type char.
Eg: Consider the following -

int a=0;
char* c; // This will have some address 

What i wanna do is read 4 bytes starting from c (i.e. the address) and assign them in variable a which is an integer.

My Solution:

a = *(int*)c;  // Assembly is LDR    r1, [r6,#0x00]

My Problem:
Above solution works well on some architectures but fails on some. To be specific, in my case, it fails on Arm CortexM0.

If any one has any portable, highly efficient(with minimum assembly) replacement of my solution please share, it would be a great help to me and I thank you for that in advance ;)

Please ask if more info needed.

Mrmj
  • 115
  • 1
  • 1
  • 8

4 Answers4

7

The problem could be because of alignment. Some CPU architectures can't read or write non-byte values on unaligned addresses.

The solution is to make unaligned byte-access instead, which can easily be done with memcpy:

memcpy(&a, c, sizeof a);
Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 3
    There might still be an endianness issue with that solution. – Jabberwocky Feb 01 '17 at 11:16
  • @someProgrammerDude : Though this can solve the problem, but if u compare the efficiency, My code generates one load instruction yours will generate Hell lot of extra code b/c of memcpy call Furthermore, i dont have memcpy available here :( – Mrmj Feb 01 '17 at 12:16
  • 2
    @Mrmj Most compilers have special handling of `memcpy` to make them *very* efficient, and might even replace them inline with only a few instructions. You say you can code it with only a single instruction? But if the problem is with alignment, then you actually *can't* do it with only a single instruction. You should also consider maintainability and the legibility of the code, especially if endianness is not an issue. Lastly, why don't you try it out? Build with optimization and look at the generated assembly code? – Some programmer dude Feb 01 '17 at 12:23
  • @Mrmj And note that unaligned access *is* invalid on the Cortex-M0 ([reference](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0497a/BABFAIGG.html)). – Some programmer dude Feb 01 '17 at 12:29
  • 1
    @Mrmj Lastly the Cortex M0 is by default little-endian (the [`ENDIANNESS` flag of `AIRCR` register](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0497a/Cihehdge.html) is [by default zero](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0497a/Cihehdge.html)), meaning that if you communicate with other little-endian systems (which all x86 systems are) then the only possible reasons for a failure will be due to alignment. But we don't actually know that because your question is to vague. – Some programmer dude Feb 01 '17 at 12:37
  • @Mrmj Please take some time to [read about how to ask good questions](http://stackoverflow.com/help/how-to-ask), and learn how to create a [Minimal, Complete, and Verifiable Example](http://stackoverflow.com/help/mcve). – Some programmer dude Feb 01 '17 at 12:38
  • @ Some programmer dude : Yes U got it right friend! The problem is b/c unaligned access. I didn't say , i can do that with one instr. I said "My code generates one load instruction", here i mean , my current code which is working on some archs, generates only one instr(LDR) and job done , So i was looking for something close to it which works for all archs. +1 for Cortex Link Surely i'll try to improve on my qus. :) – Mrmj Feb 01 '17 at 12:53
  • @Mrmj Glad to have helped, even if answer isn't exactly what you want to hear. :) However, the only portable way to handle this (without caring about endianness issues) is to use `memcpy`. It will probably be more efficient than you expect anyway. :) – Some programmer dude Feb 01 '17 at 12:58
4

There are at many different problems here.

  • Alignment. The char pointer must point at an aligned address if you wish to read an integer at that address.
  • Signedness of char. It is implementation-defined whether char is treated as signed or unsigned. It is therefore a bad type to use for any form of bit/byte manipulation. Instead, use uint8_t.
  • Pointer aliasing. Casting a raw address pointed at by a char* to an int* is undefined behavior as it violates the so-called strict aliasing rule. This could cause your code to get incorrectly optimized by the compiler (particularly gcc). The other way around, from int* to char* would have been fine though.

Endianess is not an issue if the stored integer is already in the same endianess format as that of the current system. If not, you'd have to convert it, but that's quite unrelated to the question here...

Example of a portable, safe solution:

#include <stdint.h>
#include <assert.h>
#include <string.h>

#include <stdio.h>
#include <inttypes.h>


int main (void) {

  int x = 123;
  uint8_t* c = (uint8_t*)&x; // point to something that is an int
  assert((uintptr_t)c % _Alignof(uint32_t) == 0); // ensure no misalignment

  uint32_t i;
  memcpy(&i, c, sizeof(i)); // safely copy data without violating strict aliasing

  printf("%"PRIu32, i); // print 123

  return 0;
}
Community
  • 1
  • 1
Lundin
  • 195,001
  • 40
  • 254
  • 396
1

If endianness is an issue for you:

Instead of:

a = *(int*)c;  // Assembly is LDR    r1, [r6,#0x00]

you need this:

On big endian systems:

a = c[0] << 24 | c[1] << 16 | c[2] << 8 | c[3];

On little endian systems:

a = c[3] << 24 | c[2] << 16 | c[1] << 8 | c[0];

// probably faster (only on little endian systems) :
memcpy(&a, c, sizeof a);
Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
  • Why would he have a pointer to an integer in memory, which is not of the same endianess as the CPU? That seems rather unrelated to the question. – Lundin Feb 01 '17 at 12:59
0

Depending on the endianness

#include <stdio.h>

int main(void)
{
    unsigned char bytes[] = { 0xAA, 0x55, 0xAA, 0x55 };
    unsigned int a=0;
    unsigned char* c = bytes;

    a += (*c++ & 0xFFFFFFFFu) << 0;
    a += (*c++ & 0xFFFFFFFFu) << 8;
    a += (*c++ & 0xFFFFFFFFu) << 16;
    a += (*c & 0xFFFFFFFFu) << 24;

    printf("HEX: %X\n", a);

    a = 0;
    c = bytes;

    a |= (*c++ & 0xFFFFFFFFu) << 24;
    a |= (*c++ & 0xFFFFFFFFu) << 16;
    a |= (*c++ & 0xFFFFFFFFu) << 8;
    a |= (*c & 0xFFFFFFFFu) << 0;

    printf("HEX: %X\n", a);
}
LPs
  • 16,045
  • 8
  • 30
  • 61
  • This generates alot (at least from my point of view) of instructions compared to mine which generates only one assembly instruction. – Mrmj Feb 01 '17 at 12:19
  • As I wrote: if you are aware of endianness issues you are free to use you method. Otherwise you must take care of it. – LPs Feb 01 '17 at 12:22
  • Why would he have a pointer to an integer in memory, which is not of the same endianess as the CPU? That seems rather unrelated to the question. – Lundin Feb 01 '17 at 12:59
  • @Lundin I've seen too many times code like this accessing payload of bytes packets to extract data (e.g modbus). So probably I am biased ;) – LPs Feb 01 '17 at 13:26