13

I don't understand the difference between a slice and a reference. What is the difference between &String and &str? I read some stuff online that said a reference was a thin pointer and slice is a fat pointer, but I don't know and can't seem to find what those two mean. I know that a slice can coerce into a reference, but how does it do that? What is the Deref trait?

nbro
  • 15,395
  • 32
  • 113
  • 196
oberblastmeister
  • 864
  • 9
  • 13
  • `I read some stuff online` https://stackoverflow.com/questions/57754901/what-is-a-fat-pointer-in-rust – KamilCuk Apr 11 '20 at 01:39
  • I read it but I don't understand why a slice has unknown size at compile time for example &str. – oberblastmeister Apr 11 '20 at 01:48
  • Given a variable `x` of type `&str` - what size does it have? Compare it with variables of type `i8` (1 byte) or i128 (16 bytes) or a `struct Foo {x:i8,y:i128}` (1 byte + 16 bytes = 17 bytes, ignoring alignment). A variable of type `&Foo` will use the size of the pointer, dererencing will give you an object of size 17 bytes. Since the size of `str` or `[i8]` is variable, it has to be stored making a fat pointer out of the reference. – CoronA Apr 11 '20 at 03:36
  • Commenting just to link to [What is the difference between a slice and an array?](https://stackoverflow.com/q/30794235/3650362) – trent Apr 11 '20 at 04:34

2 Answers2

30

In Rust, a slice is a contiguous block of homogeneously typed data of varying length.

What does this mean?

  • [u8] is a slice. In memory, this is a block of u8s. The slice itself is the data. Many times though, people refer to &[u8] as a slice. A &[u8] is a reference to that block of data. That reference contains two things: a pointer to the data itself, and the length of the data. Since it contains two things, it is therefore called a fat pointer. A &u8 is also a reference (can also be thought of as a pointer in this case *), but we already know that whatever it points to will be a single u8. Therefore, it is a thin pointer since it only has one element.

    You are guaranteed that all the data in a [u8] is of type u8.

    Since your [u8] is just defined as a contiguous block of memory of type u8, there's no compile time definition as to how large it is. Hence, we need to store its length in a pointer to it. We also can't put it on the stack (This translates to: we can't have a local variable that is just a [u8] **).

Expanding:

  • A [T] is a slice of Ts. For any given T, as long as T is itself a sized type ***, we can imagine a type [T].
  • A str is a slice of a string. It is guaranteed to be valid UTF-8 text, and that's everything that separates it from a [u8]. Rust could have dumped the valid UTF-8 guarantee and just defined everything else in str as part of [u8].

Well, since you can't own a slice locally ****, you might be wondering how we create slices.

The answer is that we put the data in something with the size already known, and then borrow slices from that.

Take for example:

let my_array: [u32; 3] = [1, 2, 3];

We can slice my_array into a [u32] like so:

let my_slice: [u32] = my_array[..];

But by slicing we lose size information statically, and since local variables end up on the stack which requires a predetermined size, we must put it under a reference:

let my_slice: &[u32] = &my_array[..];

The point of a slice, is that it's a very flexible (barring lifetimes) method of working with contiguous blocks of data, no matter where the data comes from. I could've just as easily made my_array a Vec<u8>, which is heap-allocated, and it would still have worked.

What is the difference between &String and &str?

&String is a reference to the whole string object. The string object in Rust is essentially a Vec<u8>. A Vec contains a pointer to the data it "contains", so your &String could be thought of as a &&str. And, that is why we could do either of the following:

let my_string: String = "Abc".to_string();

let my_str: &str = &my_string[..]; // As explained previously
// OR
let my_str: &str = &*my_string;

The explanation of this brings me to your last question:

What is the deref trait?

The Deref trait, is a trait which describes the dereference (*) operator. As you saw above, I was able to do *my_string. That's because String implements Deref, which allows you to dereference the String. Similarly, I can dereference a Vec<T> into a [T].

Note however, that the Deref trait is used in more places than just where * is used:

let my_string: String = "Abc".to_string();

let my_str: &str = &my_string;

If I try to assign a value of type &T into a place of type &U, then Rust will try to dereference my T, as many times as it takes to get a U, while still keeping at least one reference. Similarly, if I had a &&&&....&&&&T, and I tried to assign it to a &&&&....&&&&U, it would still work.

This is called deref coercion: automatically turning a &T into a &U, where some amount of *T would result in a U.


  • *: Raw pointers *const T and *mut T are the same size as references, but are treated as opaque by the compiler. The compiler doesn't make any guarantees about what is behind a raw pointer, or even that they're correctly aligned. Hence, they are unsafe to dereference. But since the Deref trait defines a deref method which is safe, dereferencing a raw pointer is special, and will not be done automatically either.
  • **: This includes other dynamically sized types too, such as trait objects, and extern types. This also includes structs which contain a dynamically sized type as their last member as well, although these are very difficult to correctly construct, but will become easier in the future with the CoerceUnsized trait. It is possible to invalidate all of this (Except for extern types) with the use of the unsized_locals nightly feature which allows some use of dynamically sized locals.
  • ***: Sized types are all types whose size is known at compile time. You can identify them generically; given a type T, T's size is known at compile time if T: Sized. If T: ?Sized, then its size may not be known at compile time (T: ?Sized is the most flexible requirement for callers since it accepts anything). Since a slice requires the data inside to be contiguous, and homogenous in size and type, dynamically sized types (Or !Sized) aren't possible to contain within a slice, or an array, or a Vec<T>, and maintain O(1) indexing. While Rust could probably write special code for indexing into a group of dynamically sized types, it currently doesn't.
  • ****: You actually can own a slice, it just has to be under a pointer which owns it. This can be, for example, a Box<[T]>, or a Rc<[T]>. These will deallocate the slice on their own (A Box when dropped, and a Rc when all strong and weak references of an Rc are dropped (The value's destructor is called when all strong references are dropped, but the memory isn't freed until all weak references are gone, too.)).
Optimistic Peach
  • 3,862
  • 2
  • 18
  • 29
  • 1
    Quoting the [book](https://doc.rust-lang.org/book/ch04-03-slices.html): _Another data type that does not have ownership is the slice. Slices let you reference a contiguous sequence of elements in a collection rather than the whole collection._ So `[u8]` is not a slice, it's an unsized array; `&[u8]` is a slice. – Aloso Apr 12 '20 at 05:52
  • 3
    The book is calling `&[T]` a slice because it is easier to understand for beginners, who are only new to the language. The [reference page for a slice](https://doc.rust-lang.org/stable/reference/types/slice.html) states that _The slice type is written as `[T]`_, and [the documentation for a slice](https://doc.rust-lang.org/std/primitive.slice.html) describes it as _A dynamically-sized view into a contiguous sequence, `[T]`_. – Optimistic Peach Apr 12 '20 at 18:26
  • This is a nice answer, but this part is not fully clear: "_But since we can't own a local variable whose size isn't already known, we must put it under a reference_". When you do `let my_slice: [u32] = my_array[..];`, `my_array` has a known size. I am also not sure that the verb "own" is appropriate in this case because of its special meaning in Rust. I don't think you're using it in the Rust sense here. – nbro Jan 01 '23 at 00:10
  • @nbro Fair enough, I've edited it to more precisely mean the point I wanted. – Optimistic Peach Jan 02 '23 at 01:57
6

What is a reference

A reference is like a pointer from C (which represents a memory location), but references are never invalid* (i.e. null) and you can't do pointer arithmetic on references. Rust's references are pretty similar to C++'s references. One important motivating reason to use references is to avoid moveing or cloneing variables. Let's say you have a function that calculates the sum of a vector (note: this is a toy example, the right way to get the sum of a vector is nums.iter().sum())

fn sum(nums: Vec<u32>) -> Option<u32> {
    if nums.len() == 0 {
        return None;
    }
    let mut sum = 0;
    for num in nums {
        sum += num;
    }
    Some(sum);
}

this function moves the vector, so it is unusable afterward.

let nums = vec!(1,2,3,4,5);
assert_eq!(sum(nums), 15);
assert_eq!(nums[0], 1); //<-- error, nums was moved when we calculated sum

the solution is to pass a reference to a vector

fn sum(nums: &Vec<u32>) -> Option<u32> {
...
}
let nums = vec!(1,2,3,4,5);
assert_eq!(sum(&nums), 15);
assert_eq!(nums[0], 1); // <-- it works!

What is a slice

A slice is "a view into a [contiguous] block of memory represented as a pointer and a length." It can be thought of as a reference to an array (or array-like thing). Part of Rust's safety guarantee is ensuring you don't access elements past the end of your array. To accomplish this, slices are represented internally as a pointer and a length. This is fat compared to pointers, which contain no length information. Similar to the sum example above, if nums were an array, rather than a Vec, you would pass a slice to sum(), rather than the array itself.

String vs str

A str is an array of utf-8 encoded characters, and an &str is a slice of utf-8 encoded characters. String is a Vec of utf-8 encoded characters, and String implements Deref<Target=str>, which means that an &String behaves a lot like (coerces to) an &str. This is similar to how &Vec<u32> behaves like &[u32] (Vec implements Deref<Target=[T]>)


* unless made invalid with unsafe rust

asky
  • 1,520
  • 12
  • 20
  • 2
    P.S. to be clear, slices are also references. When you're generic over a type `T`, then `&T` is _definitely_ a reference, and it _might_ also be a slice. – Aloso Apr 12 '20 at 06:03