1

Is there a way in C/C++ to cast a char array to an int at any position?

I tried the following, bit it automatically aligns to the nearest 32 bits (on a 32 bit architecture) if I try to use pointer arithmetic with non-const offsets:

unsigned char data[8];
data[0] = 0; data[1] = 1; ... data[7] = 7;
int32_t p = 3;
int32_t d1 = *((int*)(data+3));  // = 0x03040506  CORRECT
int32_t d2 = *((int*)(data+p));  // = 0x00010203  WRONG

Update:

  • As stated in the comments the input comes in tuples of 3 and I cannot change that.
  • I want to convert 3 values to an int for further processing and this conversion should be as fast as possible.
  • The solution does not have to be cross platform. I am working with a very specific compiler and processor, so it can be assumed that it is a 32 bit architecture with big endian.
  • The lowest byte of the result does not matter to me (see above).

My main questions at the moment are: Why has d1 the correct value but d2 does not? Is this also true for other compilers? Can this behavior be changed?

Karsten
  • 1,814
  • 2
  • 17
  • 32
  • 2
    **Moderator Note**: The comments devolved. They were deleted. If you have an answer, please post it. Have a better way? Post it as an answer! Take a chance. – George Stocker Nov 18 '14 at 14:31
  • Since it does not seem to be possible I solved it partly by working with 4 pixels at a time to avoid alignment problems. – Karsten Nov 18 '14 at 17:16
  • This did not work either... The compiler seems to align at 2^N borders. – Karsten Nov 18 '14 at 18:21

2 Answers2

4

No you can't do that in a portable way.

The behaviour encountered when attempting a cast from char* to int* is undefined in both C and C++ (possibly for the very reasons that you've spotted: ints are possibly aligned on 4 byte boundaries and data is, of course, contiguous.)

(The fact that data+3 works but data+p doesn't is possibly due to to compile time vs. runtime evaluation.)

Also note that the signed-ness of char is not specified in either C or C++ so you should use signed char or unsigned char if you're writing code like this.

Your best bet is to use bitwise shift operators (>> and <<) and logical | and & to absorb char values into an int. Also consider using int32_tin case you build to targets with 16 or 64 bit ints.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483
3

There is no way, converting a pointer to a wrongly aligned one is undefined.

You can use memcpy to copy the char array into an int32_t.

int32_t d = 0;
memcpy(&d, data+3, 4); // assuming sizeof(int) is 4

Most compilers have built-in functions for memcpy with a constant size argument, so it's likely that this won't produce any runtime overhead.

Even though a cast like you've shown is allowed for correctly aligned pointers, dereferencing such a pointer is a violation of strict aliasing. An object with an effective type of char[] must not be accessed through an lvalue of type int.

In general, type-punning is endianness-dependent, and converting a char array representing RGB colours is probably easier to do in an endianness-agnostic way, something like

int32_t d = (int32_t)data[2] << 16 | (int32_t)data[1] << 8 | data[0];
mafso
  • 5,433
  • 2
  • 19
  • 40
  • Not a good idea. C does not require `char` to be unsigned. If `char` is signed, the conversion to `int32_t` will sign-extend, potentially setting all the most significant bits in `d` with the `|`. – EOF Nov 18 '14 at 14:02
  • Unfortunately I cannot change the input. That is given in my case. I know I could do it with shifting and so on, but that is much slower and this part is rather performance critical. I hoped there would be a direct way which only needs roughly one instruction instead of several, since the data already is in the correct order in memory... – Karsten Nov 18 '14 at 14:04
  • @EOF: `data` is `unsigned char` in the question. – mafso Nov 18 '14 at 14:04
  • @EOF: I am using unsigned char so that would not be a problem – Karsten Nov 18 '14 at 14:04
  • @Karsten: Compilers usually have built-ins for `memcpy`-ing a constant amount of bytes, that shouldn't be any overhead. – mafso Nov 18 '14 at 14:05
  • I've forgot that you want an offset from the array, so the union trick to align doesn't work anyway (`data+3` needs to be `int`-aligned here); I'll edit this out, I think. – mafso Nov 18 '14 at 14:07
  • 1
    @mafso: I see. However, you really don't need the union if you `memcpy()`. `char` aliases `int`, you can `memcpy()` directly to an `int`. – EOF Nov 18 '14 at 14:07
  • @Karsten: Your "one instruction to copy" goal may not work anyway for unaligned access (especially if you have a platform were unaligned access is a problem in the first place). – mafso Nov 18 '14 at 14:19
  • 1
    @Karsten: Are you *sure* shifting is too slow? – Bathsheba Nov 18 '14 at 14:23
  • @Bathsheba: I did some profiling and with -O2 shifting actually twice as slow as direct access/conversion which does not work unfortunately. `memcpy` is only slightly faster than shifting. But I assume it is the best I can get :( – Karsten Nov 18 '14 at 14:44
  • 2
    @Karsten: Well, if you're willing to accept a program that "[...]does not work unfortunately", I can give you one that takes no time to run at all. Why do you compare a working program's runtime with a non-working one's? – EOF Nov 18 '14 at 14:53
  • @EOF: I did this comparison, because I hoped it would somehow be possible to achieve this and I was just not doing it the right way, because it worked with a constant offset. – Karsten Nov 18 '14 at 15:15
  • @Karsten: Does it work with a constant offset if the contents of `data` are unknown to the compiler? And for your real use-case: Is the offset constant? – mafso Nov 18 '14 at 15:23