3

Is there a Rust equivalent of the following C++ sample (that I've written for this question):

union example {
    uint32_t fullValue;

    struct {
        unsigned sixteen1: 16;
        unsigned sixteen2: 16;
    };


    struct {
        unsigned three: 3;
        unsigned twentynine: 29;
    };

};

example e;
e.fullValue = 12345678;

std::cout << e.sixteen1 << ' ' << e.sixteen2 << ' ' << e.three << ' ' << e.twentynine;

For reference, I'm writing a CPU emulator & easily being able to split out binary parts of a variable like this & reference them by different names, makes the code much simpler. I know how to do this in C++ (as above), but am struggling to work out how to do the equivalent in Rust.

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
Phil
  • 2,392
  • 18
  • 21
  • 3
    I don't know Rust (enough), but I'm fairly certain that what you're doing in C++ is an undefined behaviour, cf. https://stackoverflow.com/questions/67904738/is-it-undefined-behaviour-to-read-a-different-member-than-was-written-in-a-union. It might work on your compiler and OS, but in general is not guaranteed so. – lukeg Sep 05 '22 at 11:00
  • @lukeg Hi Luke, thank you for your comment. I am aware of this - The question still stands, as I would like to achieve similar in Rust – Phil Sep 05 '22 at 11:38
  • Rust has [`union`s](https://doc.rust-lang.org/stable/reference/items/unions.html) too, however I don't think they can contain anonymous fields, so you will need to access the fields with something like `e.by_halves.sixteen1` or `e.by_xxx.three`. – Jmb Sep 05 '22 at 12:16
  • @Jmb that would still require bitfields tho. – Masklinn Sep 05 '22 at 13:29
  • In Rust it is not UB, actually... – Chayim Friedman Sep 05 '22 at 13:47
  • I ran into UB in C years ago with this same issue. The problem is that you're trying to hide/eliminate some operations by defining your data boundaries. C would need to generate masking instructions just to access these bits - ontop of whatever bit manipulation you're doing. Not to mention the spec doesn't make guarantees that those union'ed types don't have space between for alignment. IMO, you should write the code such that these operations are explicit. This is the only good way of doing hardware level work like bit manipulation. Basically something like apilats answer – James Newman Nov 13 '22 at 22:09

4 Answers4

6

You could do this by creating a newtype struct and extracting the relevant bits using masking and/or shifts. This code to do this is slightly longer (but not much so) and importantly avoids the undefined behavior you are triggering in C++.

#[derive(Debug, Clone, Copy)]
struct Example(pub u32);

impl Example {
    pub fn sixteen1(self) -> u32 {
        self.0 & 0xffff
    }
    pub fn sixteen2(self) -> u32 {
        self.0 >> 16
    }
    pub fn three(self) -> u32 {
        self.0 & 7
    }
    pub fn twentynine(self) -> u32 {
        self.0 >> 3
    }
}

pub fn main() {
    let e = Example(12345678);
    println!("{} {} {} {}", e.sixteen1(), e.sixteen2(), e.three(), e.twentynine());
}
apilat
  • 1,370
  • 8
  • 17
  • 1
    Nit: those are getters, but not setters. Unsure if OP asked for setters, though, it's not entirely clear from his question. But the C++ union would allow setting the values as well. – Finomnis Sep 12 '22 at 15:15
  • @Finomnis Not really, as it's in pure UB territory. Code that explicitly defines the operations here is needed. If the C++ one wasn't UB, it would only work in hardware by generating the bit manipulation that you'd write here. – James Newman Nov 13 '22 at 22:13
3

Update

You can make some macros for extracting certain bits:

// Create a u32 mask that's all 0 except for one patch of 1's that
// begins at index `start` and continues for `len` digits.
macro_rules! mask {
    ($start:expr, $len:expr) => {
        {
            assert!($start >= 0);
            assert!($len > 0);
            assert!($start + $len <= 32);

            if $len == 32 {
                assert!($start == 0);
                0xffffffffu32
            } else {
                ((1u32 << $len) - 1) << $start
            }
        }
    }
}
const _: () = assert!(mask!(3, 7) == 0b1111111000);
const _: () = assert!(mask!(0, 32) == 0xffffffff);

// Select `num_bits` bits from `value` starting at `start`.
// For example, select_bits!(0xabcd1234, 8, 12) == 0xd12
// because the created mask is 0x000fff00.
macro_rules! select_bits {
    ($value:expr, $start:expr, $num_bits:expr) => {
        {
            let mask = mask!($start, $num_bits);
            ($value & mask) >> mask.trailing_zeros()
        }
    }
}
const _: () = assert!(select_bits!(0xabcd1234, 8, 12) == 0xd12);

Then either use these directly on a u32 or make a struct to implement taking certain bits:

struct Example {
    v: u32,
}

impl Example {
    pub fn first_16(&self) -> u32 {
        select_bits!(self.v, 0, 16)
    }

    pub fn last_16(&self) -> u32 {
        select_bits!(self.v, 16, 16)
    }

    pub fn first_3(&self) -> u32 {
        select_bits!(self.v, 0, 3)
    }

    pub fn last_29(&self) -> u32 {
        select_bits!(self.v, 3, 29)
    }
}

fn main() {
    // Use hex for more easily checking the expected values.
    let e = Example { v: 0x12345678 };
    println!("{:x} {:x} {:x} {:x}", e.first_16(), e.last_16(), e.first_3(), e.last_29());

    // Or use decimal for checking with the provided C code.
    let e = Example { v: 12345678 };
    println!("{} {} {} {}", e.first_16(), e.last_16(), e.first_3(), e.last_29());
}

Original Answer

While Rust does have unions, it may be better to use a struct for your use case and just get bits from the struct's single value.

// Create a u32 mask that's all 0 except for one patch of 1's that
// begins at index `start` and continues for `len` digits.
macro_rules! mask {
    ($start:expr, $len:expr) => {
        {
            assert!($start >= 0);
            assert!($len > 0);
            assert!($start + $len <= 32);

            let mut mask = 0u32;
            for i in 0..$len {
                mask |= 1u32 << (i + $start);
            }

            mask
        }
    }
}

struct Example {
    v: u32,
}

impl Example {
    pub fn first_16(&self) -> u32 {
        self.get_bits(mask!(0, 16))
    }

    pub fn last_16(&self) -> u32 {
        self.get_bits(mask!(16, 16))
    }

    pub fn first_3(&self) -> u32 {
        self.get_bits(mask!(0, 3))
    }

    pub fn last_29(&self) -> u32 {
        self.get_bits(mask!(3, 29))
    }

    // Get the bits of `self.v` specified by `mask`.
    // Example:
    // self.v == 0xa9bf01f3
    // mask   == 0x00fff000
    // The result is 0xbf0
    fn get_bits(&self, mask: u32) -> u32 {
        // Find how many trailing zeros `mask` (in binary) has.
        // For example, the mask 0xa0 == 0b10100000 has 5.
        let mut trailing_zeros_count_of_mask = 0;
        while mask & (1u32 << trailing_zeros_count_of_mask) == 0 {
            trailing_zeros_count_of_mask += 1;
        }

        (self.v & mask) >> trailing_zeros_count_of_mask
    }
}

fn main() {
    // Use hex for more easily checking the expected values.
    let e = Example { v: 0x12345678 };
    println!("{:x} {:x} {:x} {:x}", e.first_16(), e.last_16(), e.first_3(), e.last_29());

    // Or use decimal for checking with the provided C code.
    let e = Example { v: 12345678 };
    println!("{} {} {} {}", e.first_16(), e.last_16(), e.first_3(), e.last_29());
}

This setup makes it easy to select any range of bits you want. For example, if you want to get the middle 16 bits of the u32, you just define:

pub fn middle_16(&self) -> u32 {
    self.get_bits(mask!(8, 16))
}

And you don't even really need the struct. Instead of having get_bits() be a method, you could define it to take a u32 value and mask, and then define functions like

pub fn first_3(v: u32) -> u32 {
    get_bits(v, mask!(0, 3))
}

Note

I think this Rust code works the same regardless of your machine's endianness, but I've only run it on my little-endian machine. You should double check it if it could be a problem for you.

Andrew
  • 904
  • 5
  • 17
  • 1
    Note that the `mask!` macro can be written more succinctly as `((1 << len) - 1) << start`, as long as you handle the edge case of shifting 32 bits correctly. – apilat Sep 08 '22 at 12:38
2

You could use the bitfield crate.

This appears to approximate what you are looking for at least on a syntactic level.

BitTickler
  • 10,905
  • 5
  • 32
  • 53
2

For reference, your original C++ code prints:

24910 188 6 1543209

Now there is no built-in functionality in Rust for bitfields, but there is the bitfield crate.

It allows specifying a newtype struct and then generates setters/getters for parts of the wrapped value.

For example pub twentynine, set_twentynine: 31, 3; means that it should generate the setter set_twentynine() and getter twentynine() that sets/gets the bits 3 through 31, both included.

So transferring your C++ union into a Rust bitfield, this is how it could look like:

use bitfield::bitfield;

bitfield! {
    pub struct Example (u32);

    pub full_value, set_full_value: 31, 0;

    pub sixteen1, set_sixteen1: 15, 0;
    pub sixteen2, set_sixteen2: 31, 16;

    pub three, set_three: 2, 0;
    pub twentynine, set_twentynine: 31, 3;
}

fn main() {
    let mut e = Example(0);
    e.set_full_value(12345678);

    println!(
        "{} {} {} {}",
        e.sixteen1(),
        e.sixteen2(),
        e.three(),
        e.twentynine()
    );
}
24910 188 6 1543209

Note that those generated setters/getters are small enough to have a very high chance to be inlined by the compiler, giving you zero overhead.

Of course if you want to avoid adding an additional dependency and instead want to implement the getters/setters by hand, look at @apilat's answer instead.


Alternative: the c2rust-bitfields crate:

use c2rust_bitfields::BitfieldStruct;

#[repr(C, align(1))]
#[derive(BitfieldStruct)]
struct Example {
    #[bitfield(name = "full_value", ty = "u32", bits = "0..=31")]
    #[bitfield(name = "sixteen1", ty = "u16", bits = "0..=15")]
    #[bitfield(name = "sixteen2", ty = "u16", bits = "16..=31")]
    #[bitfield(name = "three", ty = "u8", bits = "0..=2")]
    #[bitfield(name = "twentynine", ty = "u32", bits = "3..=31")]
    data: [u8; 4],
}

fn main() {
    let mut e = Example { data: [0; 4] };

    e.set_full_value(12345678);

    println!(
        "{} {} {} {}",
        e.sixteen1(),
        e.sixteen2(),
        e.three(),
        e.twentynine()
    );
}
24910 188 6 1543209

Advantage of this one is that you can specify the type of the union parts yourself; the first one was u32 for all of them.

I'm unsure, however, how endianess plays into this one. It might yield different results on a system with different endianess. Might require further research to be sure.

Finomnis
  • 18,094
  • 1
  • 20
  • 27