1

Just a quick question concerning the rust programming language. Assume you had the following in C:

uint8_t *someblockofdata; /* has certain length of 4 */
uint32_t *anotherway = (uint32_t*) someblockofdata;

Regardless of the code not being all that useful and rather ugly, how would I go about doing that in rust? Say you have a &[u8]with a length divisible by 4, how would you "convert" it to a &[u32] and back (preferrably avoiding unsafe code as much as possible and retaining as much speed as possible).

Just to be complete, the case where I would want to do that is an application which reads u8s from a file and then manipulates those.

Leon
  • 37
  • 1
  • 4
  • The answer strongly depends on the environment. There are processors out there that could give your an "unalign access" exception and your program crashes under some circumstances. – harper Dec 10 '14 at 19:01
  • 1
    Well you *can* `transmute` a `*u8` to a `*u32`, or convert a `&[u8]` to a `&[u32]`, but the right way depends on what exactly you're trying to achieve, so better ask a targeted question. –  Dec 10 '14 at 19:25
  • 1
    It's also a [strict aliasing](http://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) violation in C – M.M Dec 10 '14 at 20:40
  • @MattMcNabb: Are you sure? I seem to recall that `void*`, `char*`, `signed char*` and `unsigned char*` (and their `const`/`volatile` variants) are exempt of the strict aliasing rule to allow raw memory manipulation. – Matthieu M. Dec 11 '14 at 08:39
  • @MatthieuM see question linked in my comment. Aliasing *to* char is OK but it's not a symmetric relation – M.M Dec 11 '14 at 08:44
  • @MattMcNabb: Ah, indeed reading Ben Voigt's answer clarified the asymmetry. On the other hand, a `(my_packet_t*)` cast is routinely used in C for 0-copy code; I would surmise there will be no ill-effect in practice as long as the memory is not *changed* via the `char*` variable, but it indeed an aliasing violation anyway. – Matthieu M. Dec 11 '14 at 09:02

1 Answers1

2

Reinterpret casting a pointer is defined between pointers to objects of alignment-compatible types, and it may be valid in some implementations, but it's non-portable. For one thing, the result depends on the endianness (byte order) of your data, so you may lose performance anyway through byte-swapping.

First rewrite your C as follows, verify that it does what you expect, and then translate it to Rust.

// If the bytes in the file are little endian (10 32 means 0x3210), do this:
uint32_t value = someblockofdata[0] | (someblockofdata[1] << 8)
                 | (someblockofdata[2] << 16) | (someblockofdata[3] << 24);

// If the bytes in the file are big endian (32 10 means 0x3210), do this:
uint32_t value = someblockofdata[3] | (someblockofdata[2] << 8)
                 | (someblockofdata[1] << 16) | (someblockofdata[0] << 24);

// Middle endian is left as an exercise for the reader.
Community
  • 1
  • 1
Damian Yerrick
  • 4,602
  • 2
  • 26
  • 64
  • 1
    Note: while technically correct, this is not an exact answer to the question. Specifically, it does not address iterating over a `&[u8]` as a `&[u32]` without losing performance. – Matthieu M. Dec 11 '14 at 08:40
  • @MatthieuM. I clarified the answer to indicate that you may lose performance anyway to byte-swapping. – Damian Yerrick Dec 11 '14 at 18:34