9

Answers to What are the differences between Rust's `String` and `str`? describe how &str and String relate to each other.

What is surprising is that a str is more limited than a fixed-sized array, because it cannot be declared as a local variable. Compiling

let arr_owned = [0u8; 32];
let arr_slice = &arr_owned;

let str_slice = "apple";
let str_owned = *str_slice;

in Rust 1.32.0, I get

error[E0277]: the size for values of type `str` cannot be known at compilation time
 --> src/lib.rs:6:9

which is confusing, because the size of "apple" can be known by the compiler, it is just not part of the str type.

Is there a linguistic reason for the asymmetry between Vec<T> <-> [T; N] and String <-> str owned types? Could an str[N] type, which would be a shortand to a [u8; N] that only contains provably valid UTF-8 encoded strings, replace str without breaking lots of existing code?

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
wigy
  • 2,174
  • 19
  • 32
  • 2
    `str` is actually more comparable to an *unsized slice* `[T]`, rather than a fixed size array `[T; N]`. And you can't have a variable of type `[u8]` either. – E_net4 Feb 13 '19 at 15:03
  • And what would be the point of such feature ? – Stargateur Feb 13 '19 at 15:09
  • 2
    You can use [smallstr](https://docs.rs/smallstr/0.1.0/smallstr/struct.SmallString.html) or something like that if you want to optimize the string allocations. – Boiethios Feb 13 '19 at 15:12
  • I got great insights from these comments and answers. We had 2 type aliases: `type A = String;` and `type B = Vec` in our code. And a method taking `&A` triggered clippy. That was when I started looking for these parallels between `&str` and `&[u8]` and this slight asymmetry in the type-system. – wigy Feb 13 '19 at 15:16

2 Answers2

10

asymmetry between Vec<T> <-> [T; N] and String <-> str

That's because you confused something here. The relationships are rather like this:

  • Vec<T>[T]
  • Stringstr

In all those four types, the length information is stored at runtime, not compile time. Fixed size arrays ([T; N]) are different in that regard: they store the length at compile time, but not runtime!

And indeed, both [T] and str can't be stored on the stack, because they are both unsized.

Could an str[N] type, which would be a shorthand to a [u8; N] that only contains provably valid UTF-8 encoded strings, replace str without breaking lots of existing code?

It wouldn't replace str, but it could be an interesting addition indeed! But there are probably reasons why it doesn't exist yet, e.g. because the length of a Unicode string is usually not really relevant. In particular, it usually doesn't make sense to "take a Unicode string with exactly three bytes".

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Lukas Kalbertodt
  • 79,749
  • 26
  • 255
  • 305
  • 1
    The link to the Unicode string processing is quite useful. Living in a locale which writes names and dates in strange orders, has non-ascii characters, I thought I saw most problems with localization. Boy, I was wrong. – wigy Feb 13 '19 at 15:28
6

[T] and str can't be stored on the stack, because they are both unsized

While this is true today, it may not be true in the future. RFC 1909 introduces unsized rvalues. One of the powers that this feature would give is variable-length arrays:

The RFC also describes an extension to the array literal syntax: [e; dyn n]. In the syntax, n isn't necessarily a constant expression. The array is dynamically allocated on the stack

No mention is made of whether a string will be directly possible, but one could always create a stack-allocated array of bytes to be used as storage for a string.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366