0

Basically, my question is whether the following Rust code can cause UB:

fn usigned_as_signed_ref<'a>(x: &'a u64) -> &'a i64 {
    assert!(*x <= i64::MAX as u64);
    unsafe { std::mem::transmute(x) }
}

I need it, as I have a struct containing a u64-value, but I want it to implement an interface that requires returning this value by reference. So if I can't use above code, I would have to store the value twice, once as i64 and once as u64. In other words, the situation is as follows:

trait ForeignInterface {
    fn get_value<'a>(&'a self) -> &'a i64;
}

struct MyStruct(u64);

impl ForeignInterface for MyStruct {
    fn get_value<'a>(&'a self) -> &'a i64 { usigned_as_signed_ref(&self.0) }
}
My thoughts so far

I think on a "standard system" (like x86) I expect this to be perfectly safe. In particular, i64 should be represented as two-complement, and the same endianness should be used for u64 and i64. In fact, I cannot imagine that this is different anywhere. However, I have also made the experience that if something is not guaranteed to work everywhere by the specs, it will fail in weird circumstances. I could not find any guarantee for this to work.

I was not quite sure whether this question fits StackOverflow or CodeReview better, feel free to migrate it if you think otherwise.

Edit This question seems to indicate that it is indeed ok, but does not give any evidence.

isaactfa
  • 5,461
  • 1
  • 10
  • 24
Feanor
  • 320
  • 2
  • 11
  • It is safe; size, alignment and validity are identical for both types (and references thereof). – isaactfa Jul 03 '23 at 11:00
  • 2
    It's also important to distinguish between soundness and correctness: wrong endianness might make the code incorrect but it wouldn't make it unsound. Soundness should always be derivable from Rust's memory model (which it unfortunately doesn't fully have quite yet) and be independent of a given system's architecture. But in this case, someone would really have to hate programmers to make signed and unsigned integers somehow be represented differently in memory, so I'd reckon you're good on that front. – isaactfa Jul 03 '23 at 11:09
  • should be ok but for me that heresy – Stargateur Jul 03 '23 at 11:13
  • I guess one question I have is, if it's fine to panic if the value is greater than `i64::MAX`, why not just store it in an `i64` in the first place? – isaactfa Jul 03 '23 at 11:21
  • @isaactfa: Yes, that is definitely a point to consider. There are some reasons against it (readability, other interfaces, possibly performance for additional checks?). However, it might still be better than that ugly transmute. I wanted to know the answer to the question before I decide... – Feanor Jul 03 '23 at 13:10

1 Answers1

2

Wrong endianness or byte representation will not make the code UB; it can make it incorrect (not doing what it wants to), but the behavior will still be perfectly defined. And also, endianness is a property of the machine, not the type, and all Rust integers are defined to be two's complement.

The only things that define this conversion's soundness is:

  1. Size. The size of the target type must be <= that of the source type. The size of u64 and i64 is guaranteed to be the same, so we're fine with this.
  2. Alignment. The alignment of the target type must be <= that of the source type. This is less well-defined as the alignment of primitive types is not guaranteed, but I'd say this is fine to assume the alignment of the signed and unsigned type are the same. If you are concerned, you can add an assert before the conversion:
assert!(std::mem::align_of::<i64>() <= std::mem::align_of::<u64>());
  1. Uninitialized bytes. Every uninitialized byte (padding byte or unitialized MaybeUninit) must be met with a possibly-uninitialized byte (padding byte or MaybeUninit). Primitives do not have uninitialized bytes (I don't know if this is specified somewhere, but people rely on this so this won't change), so this is trivially fine.
  2. Library invariants. When transmuting a type, you must make sure to not create invalid type according to the invariants that the library that defines the type defines. Integers do not have such invariants (besides the language invariants, such as no uninitialized memory), so we're fine with that too.
Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77