1

I need efficient way to cast part of array to variable. Let's suppose array is defined as this:

unsigned char bytes[240];

now, I need to get uint32_t value from somewhere in the array, something like this:

uint32_t * word = reinterpret_cast<uint32_t *>(bytes[4]);

Which I think will get me second word in the array right? My question is, is this safe and portable (windows, linux on x86 & x86_64, os x would be nice, don't care about arm, ia-64 etc).

Marius Bancila
  • 16,053
  • 9
  • 49
  • 91
graywolf
  • 7,092
  • 7
  • 53
  • 77
  • yes, yes. why don't you simply compile it and try it out? – Theolodis Apr 22 '14 at 11:37
  • 5
    should be `uint32_t * word = reinterpret_cast(&bytes[4]);` otherwise you're casting the value of bytes[4] into a pointer. – jsantander Apr 22 '14 at 11:37
  • 1
    The problem is, it's not portable across different machine architectures or data coming from network connections (also see [Endianess](http://en.wikipedia.org/wiki/Endianness)) – πάντα ῥεῖ Apr 22 '14 at 11:38
  • @πάνταῥεῖ what you said is correct, but the OP only seem to care about intel machines. – jsantander Apr 22 '14 at 11:42
  • @jsantander That's why I mentioned network transports. – πάντα ῥεῖ Apr 22 '14 at 11:43
  • @πάνταῥεῖ oh, yes, you're right, network protocols are typically big endian. – jsantander Apr 22 '14 at 11:45
  • @jsantander thx, good catch, I'll go with "(bytes + 4)" ^_^ @ πάνταῥεῖ yep, I know, but if I don't care about that (this code will be used only in internal calcultions), it should work regardless of OS as long as I am on little-endian architecture? – graywolf Apr 22 '14 at 11:47
  • 2
    Care also to alignment issues. – Jarod42 Apr 22 '14 at 11:53
  • @Jarod42 I thought that unsigned char[] won't have holes in it.. or what do you mean by alignment issues? – graywolf Apr 22 '14 at 12:06
  • "alignment issues" - your CPU may only be able to access an `int` if its address is a multiple of 4, for example. But your char array has no requirement to start on a multiple of 4. – M.M Apr 22 '14 at 12:15
  • You have no holes, but some architectures doesn't support that `uint32_t` address is not a multiple of `4` (from performance penalties to crash). You may read [Data_structure_alignment](http://en.wikipedia.org/wiki/Data_structure_alignment) – Jarod42 Apr 22 '14 at 12:17
  • Portable way: `uint32_t word = bytes[4] + 0x100ul * bytes[5] + 0x10000ul * bytes[6] + 0x1000000ul * bytes[7];`. You could wrap this into an inline function. – M.M Apr 22 '14 at 12:31

3 Answers3

4

You should use memcpy. This portably ensures that there are no alignment or strict aliasing problems. If no copy is needed, compilers are often smart enough to figure this out and directly reference the data in the array:

uint32_t value;
memcpy(&value, &bytes[4], sizeof value);
//Modify value:
//...
//Copy back to array:
memcpy(&bytes[4], &value, sizeof value);
Mankarse
  • 39,818
  • 11
  • 97
  • 141
  • How efficient is `memcpy` for small sizes? I have only used it for copying large blocks but maybe I should be using it for small blocks as well. – Z boson Apr 22 '14 at 12:41
2

What you do does not violate strict aliasing rules because you cast to/from a char type pointer. In the standard, pointers to char types are the only exception from the strict aliasing rules.

As others have pointed out, you can run into the problem of alignment when you cast a char* to a larger type. You can either work around this by doing the alignment yourself, or just use memcpy() as Mankarse suggests.

But even the memcpy() approach is subject to byte order problems: If you've written your program on a little endian machine (x86 for example), it will likely crash on a big endian machine (ARM for example), and vice versa.

So, if you want to write portable code, you need to use a byte order that you specify. You can easily do so using the bit shift operators:

int32_t read_word_le(signed char* bytes) {
    return (int32_t)bytes[0] +
        ((int32_t)bytes[1] << 8) +
        ((int32_t)bytes[2] << 16) +
        ((int32_t)bytes[3] << 24);
}

int32_t read_word_be(signed char* bytes) {
    return (int32_t)bytes[3] +
        ((int32_t)bytes[2] << 8) +
        ((int32_t)bytes[1] << 16) +
        ((int32_t)bytes[0] << 24);
}
cmaster - reinstate monica
  • 38,891
  • 9
  • 62
  • 106
  • 1
    Good point about byte order problems. My answer is only valid if the data in the array was originally put there by the same program (on the same platform with the same compiler). This is often a valid assumption, but it is obviously false if the data is being moved around a network or otherwise persisted between program runs. – Mankarse Apr 22 '14 at 15:53
0

I would avoid indexing on the char if I know what is in the buffer. If it is indeed an array of int, cast first and index after for clarity. If you want the second 32 bits integer in the array:

uint32_t * words = reinterpret_cast(bytes); uint32_t second = words[1];

It is hard to answer about portability as you don't provide much information on the use case. As long as the data in the bytes buffer is produced and used on the same machine, the code is portable (and would be using simply int). Things become messy when you exchange data produced on a different architecture.

Joky
  • 1,608
  • 11
  • 15