2

I read a book published by Apress named Beginning Rust - Get Started with Rust 2021 Edition

In one of the code examples, the author does not explain it in detail or clearly how the code works. Here is the code snippet

/* In a 64-bit system, it prints:
16 16 16; 8 8 8
In a 32-bit system, it prints:
8 8 8; 4 4 4
*/
fn main() {
    use std::mem::*;
    let a: &str = "";
    let b: &str = "0123456789";
    let c: &str = "abcdè";
    print!("{} {} {}; ",
        size_of_val(&a),
        size_of_val(&b),
        size_of_val(&c));
    print!("{} {} {}",
        size_of_val(&&a),
        size_of_val(&&b),
        size_of_val(&&c));
}

My question is how it work since the size_of_val takes a reference and this was done in the declaration of the &str. But how come in the print! statement, the author put another ampersand before the variable? In addition to that when we just pass the variable without an additional ampersand such as size_of_val(a or b or c), the size we get is for a 0, for b 10 and for c 6, but when we pass the variable with the ampersand such as size_of_val(&a or &b or &c), then like the comments above the main function described by the author, the sizes are 16 16 16 or 8 8 8. Last for the second print! statement (macro), the author put double ampersands to get the size of reference? How does it work. Just don't get it cuz I thought that would generate the error since size_of_val only accept one reference but then in the print! macro there is another ampersand and the second macro there are double ampersands...

Herohtar
  • 5,347
  • 4
  • 31
  • 41

1 Answers1

2

The size_of_val() function is declared as follows:

pub fn size_of_val<T>(val: &T) -> usize
where
    T: ?Sized, 

That means: given any type T (the ?Sized constraint means "really any type, even unsized ones"), we take a reference for T and give back a usize.

Let's take a as an example (b and c are the same).

When we evaluate size_of_val(a), the compiler knows that a has type &str, and thus it infers the generic parameter to be str (without a reference), so the full call is size_of_val::<str>(a /* &str */), which match the signature: we give &str for T == str.

What is the size of a str? str is actually a continuous sequence of bytes, encoding the string as UTF-8. a contains "", the empty string, which is of course zero bytes long. So size_of_val() returns 0. For b, there are 10 ASCII characters, each is one byte long UTF8-encoded, so together they're 10 bytes long. C contains 4 ASCII chars (abcd), so four bytes, and one Unicode character (è) that is two bytes wide, encoded as \xC3\xA8 (195 and 168 in decimal). So a total length of six bytes.

What does happen when we calculate size_of_val(&a)? &a is &&str because a is &str, so the compiler infers T to be &str. The size of &str is constant and always double the size of a pointer: this is because &str, i.e. a pointer to str, should include the data address and the length. On 64 bit platforms this is 16 (8 * 2); on 32 bit ones it is 8 (4 * 2). This is called a fat pointer, that is, a pointer that carries additional metadata besides just the address (note that it is not guaranteed to be double times the length, so don't rely on it, but practically it is).

When we evaluate size_of_val(&&a), the type of &&a is &&&str, so T is inferred to be &&str. While &str (a pointer to str) is a fat pointer, meaning it is doubled in size, a pointer to a fat pointer is a normal thin pointer (the opposite of a fat pointer: a pointer that only carries the address, without any additional metadata), meaning it is one machine word size. So 8 bytes for 64 bit or 4 bytes for 32 bit platforms.

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
  • I understand the first and second case scenarios but I am just wondering why the third case - passing a pointer to a pointer to a string variable? Why does the author bother doing that to get the size of the buffer area? Also would that be ok for you to explain a little bit more about the thin pointer and one machine word size? Thanks a lot. Pretty good explanation. – kenryuakuma Apr 29 '22 at 04:44
  • A machine word size is just the size of `usize` (`size_t` ignoring esoteric platforms), i.e. 64 bits (8 bytes) on 64-bit platforms, 32 on 32-bit platforms, etc.. A "thin pointer" is the opposite of a fat pointer: a fat pointer carries additional metadata except the pointer itself (the length of the `str` in this example), a thin pointer does not, and thus it's just a normal pointer, `usize` sized. As for your first question: what do you mean by "Why bother doing that to get the size of the buffer area"? – Chayim Friedman Apr 29 '22 at 04:47
  • I understand that for the first case size_of_val(a) to get the size of the buffer area of str and the second case the the size of the &str, but not sure why the author bothered evaluating size_of_val(&&a) from getting fat pointer to thin pointer. – kenryuakuma Apr 29 '22 at 04:51
  • 1
    @kenryuakuma I cannot read their mind, but I guess it was done to show that `&str` is a fat pointer and explain the difference. In practice, indeed, double pointers are rarely used in Rust, though they do have some usages: for example, there are some cases where the size of the structure is very important so you may want a double pointer to reduce the size by a half. – Chayim Friedman Apr 29 '22 at 04:53