1

I've seen some Rust codebases use the #[repr(C)] macro (is that what it's called?), however, I couldn't find much information about it but that it sets the type layout in memory to the same layout as 'C's.

Here's what I would like to know: is this a preprocessor directive restricted to the compiler and not the language itself (even though there aren't any other compiler front-ends for Rust), and why does Rust even have a memory layout different than that of Cs? (it's just that I've never had to do this in another language).

Here's a nice situation to demonstrate what I meant: if someone creates another compiler for Rust, are they required to implement this macro, or is it a compiler specific thing?

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
skevelis
  • 61
  • 8
  • https://github.com/rust-lang/reference/blob/8e7d614303b0dec7492e048e63855fcd3b944ec8/src/type-layout.md#the-c-representation `Is repr(C) a preprocessor directive?` Rust has preprocessor? What kind of "pre-processor" do you have in mind? – KamilCuk Apr 23 '21 at 20:02
  • @KamilCuk I think I'm using the wrong term here, definitely correct me if you know the correct one, but I meant that the compiler front-end preprocesses the structure to which this macro is applied with this layout, and then does the compilations/optimisations. – skevelis Apr 23 '21 at 20:07
  • Sure it does, that's what a compiler does - processes the text and then generates code. – KamilCuk Apr 23 '21 at 20:08
  • @KamilCuk So my question stands, right? – skevelis Apr 23 '21 at 20:09
  • 1
    `are they required` - well, there's no "Rust police" that will come and fine you for not implementing it.... – KamilCuk Apr 23 '21 at 20:12
  • @KamilCuk yeah but the compiler wouldn't compile all Rust codebases then. Take C++ for example, not all compilers implement `#pragma`, but _most_ code still compiles fine. I'm just asking if that's the case with Rust, and whether or not `#[repr(C)]` is similar to `#pragma` in any way in the preprocessing and in how it is interpreted by different compilers. – skevelis Apr 23 '21 at 20:16
  • Whether something is a preprocessor macro doesn't have anything to do with whether it's required for a C implementation. `#define` is also a preprocessor macro, but all C implementations are required to implement it by the standard nevertheless. – trent Apr 23 '21 at 21:19
  • Does [Is there a published language format standard for Rust yet?](https://stackoverflow.com/q/21177436/3650362) answer your question? (There isn't) – trent Apr 23 '21 at 21:21
  • I think you're confused by the way C programmers sometimes talk about things being "part of the language" or not. There's a sense in which preprocessor directives are "not part of the language" because they are handled in a completely separate pass that doesn't know anything about C syntax or semantics. But preprocessor directives like `#define` and `#include` are still part of C in that they're standardized and any C implementation has to support them in the way the standard says. On the other hand, something like `__attribute__ ((__packed__))` is the opposite: it is handled by the compiler, – trent Apr 23 '21 at 21:29
  • ... not the preprocessor, but it's "not part of the language" in a different way, because it's not defined by the standard and different compilers may not support it or may handle it differently. `#pragma` is an example of something that is "not part of the language" *both* ways: it's a preprocessor directive, so it doesn't interact with the syntax or semantics of the rest of C, *and* it's a nonstandard extension. So it's understandable you might get those things mixed up, even though they're really orthogonal. – trent Apr 23 '21 at 21:32
  • (Even `#pragma` may be considered "part of C" if you consider "C" a family of dialects rather than strictly what's defined by ISO. `#pragma` in particular is so nearly ubiquitous, you could consider it a de facto standard). – trent Apr 23 '21 at 21:34
  • 1
    Just a small sidenote, `#pragma` [is part of C](https://port70.net/~nsz/c/c11/n1570.html#6.10.6p1). And while `#pragma` is preprocessing directive, the compiler has to make decisions after preprocessor, as `#pragma STDC FENV_ACCESS ON` in a function affects only in that function, so it needs to know when the function ends, which is not something preprocessor does. – KamilCuk Apr 23 '21 at 22:21
  • @trentcl Thank you for all the well written explanations, they've helped a lot! – skevelis Apr 24 '21 at 04:29

2 Answers2

10

#[repr(C)] is not a preprocessor directive, since Rust doesn't use a preprocessor 1. It is an attribute. Rust doesn't have a complete specification, but the repr attribute is mentioned in the Rust reference, so it is absolutely a part of the language. Implementation-wise, attributes are parsed the same way all other Rust code is, and are stored in the same AST. Rust has no "attribute pass": attributes are an actual part of the language. If someone else were to implement a Rust compiler, they would need to implement #[repr(C)].

Furthermore, #[repr(C)] can't be implemented without some compiler magic. In the absence of a #[repr(...)], Rust compilers are free to arrange the fields of a struct/enum however they want to (and they do take advantage of this for optimization purposes!).

Rust does have a good reason for using it's own memory layout. If compilers aren't tied to how a struct is written in the source code, they can do optimisations like not storing struct fields that are never read from, reordering fields for better performance, enum tag pooling2, and using spare bits throughout NonZero*s in the struct to store data (the last one isn't happening yet, but might in the future). But the main reason is that Rust has things that just don't make sense in C. For instance, Rust has zero-sized types (like () and [i8; 0]) which can't exist in C, trait vtables, enums with fields, generic types, all of which cause problems when trying to translate them to C.


1 Okay, you could use the C preprocessor with Rust if you really wanted to. Please don't.

2 For example, enum Food { Apple, Pizza(Topping) } enum Topping { Pineapple, Mushroom, Garlic } can be stored in just 1 byte since there are only 4 possible Food values that can be created.

smitop
  • 4,770
  • 2
  • 20
  • 53
  • This was an *excellent* explanation. I'm surprised the Rust compiler is already doing such optimisations. I can't promise you not to use the C preprocessor though ;) – skevelis Apr 24 '21 at 04:36
1

What is this?

It is not a macro it is an attribute.

The book has a good chapter on what macros are and it mentions that there are "Attribute-like macros":

The term macro refers to a family of features in Rust: declarative macros with macro_rules! and three kinds of procedural macros:

  • Custom #[derive] macros that specify code added with the derive attribute used on structs and enums
  • Attribute-like macros that define custom attributes usable on any item
  • Function-like macros that look like function calls but operate on the tokens specified as their argument

Attribute-like macros are what you could use like attributes. For example:

#[route(GET, "/")]
fn index() {}

It does look like the repr attribute doesn't it

So what is an attribute then?

Luckily Rust has great resources like rust-by-example which includes:

An attribute is metadata applied to some module, crate or item. This metadata can be used to/for:

  • conditional compilation of code
  • set crate name, version and type (binary or library)
  • disable lints (warnings)
  • enable compiler features (macros, glob imports, etc.)
  • link to a foreign library
  • mark functions as unit tests
  • mark functions that will be part of a benchmark

The rust reference is also something you usually look at when you need to know something more in depth. (chapter for attributes)

To the compiler authors out there:

If you were to write a rust compiler, and wanted to support things like the standard library or other crates then you would 100% need to implement these. Because the libraries use these and need them.

Otherwise I guess you could come up with a subset of rust that your compiler supports. But then most people wouldn't use it..

Why does rust not just use the C layout?

The nomicon explains why rust needs to be able to reorder fields of structs for example. For reasons of saving space and being more efficient. It is related to, among other things, generics and monomorphization. In repr(C) fields of structs must be in the same order as the definition.

The C representation is designed for dual purposes. One purpose is for creating types that are interoperable with the C Language. The second purpose is to create types that you can soundly perform operations on that rely on data layout such as reinterpreting values as a different type.

Hadus
  • 1,551
  • 11
  • 22