2

The Rust Reference documents that a Rust enum annotated with #[repr(C)] can be viewed as a C struct of two fields. The first field is a C enum for the discriminant, the second field is a C union of C structs corresponding to the fields of the enum's variants.

Due to a bug in an FFI interoperation library, I need to avoid using unions that are exactly 8 bytes. To that end, I wanted to add some static assertions to my Rust code so I would be aware of any problematic enums. I do not know how to ask the compiler for the size of the generated union type (or equivalently, the size of the enum without accounting for the discriminant):

#[repr(C)]
enum UnionSizeIs8Bytes {
    A(u8),
    B(u64),
}

#[repr(C)]
enum UnionSizeIsNot8Bytes {
    A(u8),
    B(u16),
}

const _: () = {
    // Should fail, but does not
    assert!(8 != std::mem::size_of::<UnionSizeIs8Bytes>());

    // Should not fail, but does
    assert!(8 != std::mem::size_of::<UnionSizeIsNot8Bytes>());
};
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 1
    Maybe you could use a macro solution, something like [this playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=cacd6b002e35cdb0adb8bd61b50312d4)? – rodrigo Aug 31 '22 at 19:35
  • @rodrigo yeah, that might be the most straight-forward solution, it's just sad that we have to reconstruct the work that the compiler is doing for us. Amusingly, the current solution we are using is to do this on the JS side where we already have the decomposed pieces written. – Shepmaster Aug 31 '22 at 19:39

1 Answers1

2

Reading The Book about repr(C) field-less enums:

[...] the C representation has the size and alignment of the default enum size and alignment for the target platform's C ABI.

That is, they try to be fully compatible with C enums.

And in the next section about struct-like enums:

[..] is a repr(C) struct with two fields:

  • a repr(C) version of the enum with all fields removed ("the tag")
  • a repr(C) union of repr(C) structs for the fields of each variant that had them ("the payload")

That is, your enum:

#[repr(C)]
enum UnionSizeIs8Bytes {
    A(u8),
    B(u64),
}

has the same layout as this other one:

#[repr(C)]
enum UnionSizeIs8Bytes_Tag {
    A,
    B,
}
#[repr(C)]
union UnionSizeIs8Bytes_Union {
   a: u8,
   b: u64,
}
#[repr(C)]
struct UnionSizeIs8Bytes_Explicit {
    tag: UnionSizeIs8Bytes_Tag,
    data: UnionSizeIs8Bytes_Union,
}

Now, what is the actual size and alignment of an enum in C? It seems that even experts do not fully agree in the details. In practice most mainstream C compilers define the underlying type of an enum as a plain int, that will be an i32 or u32.

With that in mind, the layout of your examples should be straightforward:

  • UnionSizeIs8Bytes:

    • 0-4: tag
    • 4-8: padding
    • 8-16: union
      • 8-9: u8
      • 8-16: u64
    • Size: 16, alignment: 8
  • UnionSizeIsNot8Bytes:

    • 0-4: tag
    • 4-6: union:
      • 4-5: u8
      • 4-6: u16
    • 6-8: padding
    • Size: 8, alignment: 4

Note that the alignment of a repr(C) enum is never less than that of the tag, that is 4 bytes using the above assumptions.

To compute the size of the data without the tag, you just have to subtract to the full size the value of the alignment. The alignment value will account for the size of the tag itself plus any needed padding.

const fn size_of_enum_data<T>() -> usize {
    std::mem::size_of::<T>() - std::mem::align_of::<T>()
}

If you want to be extra sure you could subtract std::mem::align_of::<T>().max(std::mem::size_of::<i32>()), in case your architecture's i32 does not have alignment equal to 4, but unfortunately max doesn't seem to be const yet. You could write an if of course, but that gets ugly, something like:

const fn size_of_enum_data<T>() -> usize {
    let a = std::mem::align_of::<T>();
    let i = std::mem::size_of::<i32>();
    std::mem::size_of::<T>() - if a > i { a }  else { i }
}

And if you want to be extra, extra sure, you can use c_int instead of i32. But then for esoteric architectures where c_int != i32 maybe the C enum equals C int may not hold either...

Then your assertions would be (playground):

const _: () = {
    // It fails
    assert!(8 != size_of_enum_data::<UnionSizeIs8Bytes>());

    // It does not fail
    assert!(8 != size_of_enum_data::<UnionSizeIsNot8Bytes>());
};
rodrigo
  • 94,151
  • 12
  • 143
  • 190
  • *what is the actual size and alignment of an enum in C* — to be clear, I want to know the size of the generated **`union`** — the "payload", not the "tag". – Shepmaster Aug 31 '22 at 18:47
  • *the layout of your examples* — that's true, but I'm looking for a compiler-driven solution, not a manual one. The manually-computed solution doesn't lend itself to a compile-time assertion that can be checked without me remembering to do so! – Shepmaster Aug 31 '22 at 18:49
  • Please let me know how I can reword my original post to make these two points clearer. – Shepmaster Aug 31 '22 at 18:49
  • @Shepmaster: I see... the size of the payload is just the size of the enum minus the `align_of(enum).max(size_of(i32))`. I'm not sure about that compiler-driven thing: you still need to call `assert!` no matter if the function is a manual or a compiler computation. Maybe if you add an example of how you would like to use that? I assumed that you just wanted your assertions to work as commented. – rodrigo Aug 31 '22 at 18:55
  • 2
    @rodrigo There's a lot more nuance than that; namely there are two places for padding: in between the tag and the union, as well as after the union. This means it's actually **impossible** to calculate union size from just the size and alignment of the whole enum. See [here](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=7dfc5838b9958a0d600072806c324847) for a counterexample. – Aplet123 Aug 31 '22 at 19:09
  • @Shepmaster: Ah, yes. Now I'm curious about how the linked bug relates to `repr(C)` enums, because there you talk only of unions. I was assuming that you were going to access to the union by using raw or C pointers and that you would get the bytes from the start of the union to the end of the whole `repr`. Because if you know how many bytes you have to use, you already have the answer to the original question! Maybe if you add an example like `return_a_union` from the bug report but that uses an `enum` instead and triggers the bug? – rodrigo Aug 31 '22 at 19:21
  • I don't want to put too much Rust-specific information into the bug report for the JS-focused library. Really, I debated rewriting my Rust examples in C to be easier for the maintainers to understand, but decided that it should be clear enough. However, in the original code, we {accept,return} these `repr(C)` enums {directly,embedded in other types}. That means that the JS client needs to declare the 3 component types (struct, enum, union). Somewhere along the line, the ABI calculations for Windows x86_64 and the computation of the `union` size have a mismatch and cause memory corruption. – Shepmaster Aug 31 '22 at 19:36
  • I meant to add the example with enums here, not in the bug. I still do not see what piece of the FFI chain is able to get to these 8 bytes from an enum. If that piece is able to get those 8 bytes, sure we can do too. – rodrigo Aug 31 '22 at 19:51