2

I´m writing a VM in Rust and I have a C and C++ background. I need union-like functionality because on the VM stack I can either store an int or a float.

In C I had a union:

union stack_record_t {
    int i;
    float f;
};

I can use the record as int or as float with zero runtime overhead. I have a static bytecode analyzer which will find type errors before the bytecode executes, so I don't have to store a flag alongside the record.

I don´t know if it is a good idea to use unions in Rust because they are unsafe. Is there any safe way to do this in Rust - also with zero cost? Should I just use the unsafe Rust unions?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 3
    Why not use an `enum` in Rust? `union` in Rust was added mostly for FFI. I would never use a `union` without a fair bit of safe wrapper around it, bringing it to the same level as `enum`. – mcarton Mar 30 '20 at 12:56
  • But an enum requires additional memory because it is tagged right? – Kerbo Games Mar 30 '20 at 12:57
  • 1
    How do you know whether you have an `int` or `float` in your example union? – mcarton Mar 30 '20 at 12:57
  • 3
    Explain "zero overhead". Given `union stack_record arg;`, you can't use `arg.i` or `arg.f` unless you *know* that `arg` *is* an `int` or a `float`, respectively. So how are you passing that knowledge around? If there's a tag stored alongside `arg` that tells you what type it is, that's basically what an `enum` is; the cost of the `enum` isn't "overhead", it's just the cost you were paying anyway, only wrapped into a nice package. If you have some other way of knowing what type `arg` is, then an unsafe `union` might be the correct way to handle it. – trent Mar 30 '20 at 12:57
  • It depends on which instructions are executed from the bytecode. For example if the next instruction is add it will use i if it is addf (add floats) it will use f. – Kerbo Games Mar 30 '20 at 13:04
  • 1
    How should the VM behave if you use an instruction with the "wrong" value? e.g. you use an addf instruction with a value that was set using `i`. In C++ this is undefined behavior; in C or Rust it is defined to reinterpret the bytes (which still may lead to undefined behavior in certain cases, but not for `i32`->`f32` conversions, which is why that conversion is also possible in safe code using [`f32::from_bits`](https://doc.rust-lang.org/std/primitive.f32.html#method.from_bits)). – trent Mar 30 '20 at 13:13
  • I have a static bytecode anaylzer which fill find these kind of errors before the bytecode will execute - is there any big runtime overhead when using f32::from_bits? – Kerbo Games Mar 30 '20 at 13:18
  • Look at the implementation: https://doc.rust-lang.org/src/core/num/f32.rs.html#496-500. It's a very thin layer over transmute, which is basically memcpy. – SirDarius Mar 30 '20 at 13:19
  • Okay, thanks - so I let´s say I want to store a f32 inside an i32 without cast(as) only the bitpattern like unions - how would I do that. Something like: i: i32::from_bits(val.get_bits()) ?? – Kerbo Games Mar 30 '20 at 13:21
  • 1
    f32 also has the reverse method: `to_bits()`, so you'd do: `val.to_bits()`. Also note that what @trentcl said is not completely exact, as to_bits and from_bits convert between f32 and u32 (unsigned). – SirDarius Mar 30 '20 at 13:31
  • Okay thanks I will use that then - but I hope the underlying transmutate wont be too slow – Kerbo Games Mar 30 '20 at 13:37

1 Answers1

4

You can use f32::from_bits and to_bits to safely reinterpret the raw bits of a u32 as an f32 and vice versa. This is a "free" conversion – it compiles to no code (with optimizations turned on).¹ To convert between u32 and i32 you can use as casts, which are likewise free when used to change signedness.

It seems to me that u32 is the common denominator here, so you might consider making a struct that contains a u32 and exposes methods to get or set the appropriate type:

pub struct Record(u32);

impl Record {
    fn get_int(&self) -> i32 {
        self.0 as _
    }

    fn get_float(&self) -> f32 {
        f32::from_bits(self.0)
    }

    fn set_int(&mut self, value: i32) {
        self.0 = value as _;
    }

    fn set_float(&mut self, value: f32) {
        self.0 = value.to_bits();
    }
}

Compare the generated code.

See Also


¹ These functions use transmute internally, which reinterprets the bits just as using a union would. So when they are inlined by the optimizer the generated code is the same.

trent
  • 25,033
  • 7
  • 51
  • 90