4

I need to construct String from array of bytes (not Vec). This works:

let buf2 = [30, 40, 50];
let string2 = std::str::from_utf8(&buf2).unwrap().to_string();
  1. Why is there is no dedicated method for array/slice in String?
  2. Why is the parameter of from_utf8 not a generic?
  3. Is the snippet above idiomatic Rust?

I ended up not needing the String and going with &str, but the questions remain.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
nicolai
  • 1,140
  • 9
  • 17
  • _I need to construct String from array of bytes(not Vec)_ why? – Ömer Erden Aug 31 '21 at 11:00
  • why not generic => no need => https://stackoverflow.com/questions/40006219/why-is-it-discouraged-to-accept-a-reference-to-a-string-string-vec-vec-o – Stargateur Aug 31 '21 at 11:04
  • @ÖmerErden, as I said in "PS", it turned out that I didn't need such a conversion, but was still curious about the "(a)" question. – nicolai Aug 31 '21 at 11:21
  • Well question was not clear to me, there could be 2 possibilities, you might be trying to avoid heap allocation(which is impossible if you are creating String) or the other case which current answer explains. For the (b), I am sure you used `unwrap` for an example so it looks good from my perspective. – Ömer Erden Aug 31 '21 at 11:43
  • @ÖmerErden I'm just learning Rust and trying to understand why things are as they are. – nicolai Aug 31 '21 at 11:49

1 Answers1

13

There are two from_utf8 methods. One goes from &[u8] to str, the other from Vec<u8>String. Why two? What's the difference? And why isn't there one to go straight from &[u8] to String?

Cheap conversions

Let's consult the official Rust docs.

str::from_utf8(v: &[u8]) -> Result<&str, Utf8Error>

A string slice (&str) is made of bytes (u8), and a byte slice (&[u8]) is made of bytes, so this function converts between the two. Not all byte slices are valid string slices, however: &str requires that it is valid UTF-8. from_utf8() checks to ensure that the bytes are valid UTF-8, and then does the conversion.

Source

If a &[u8] byte slice contains valid UTF-8 data, a &str string slice can be created by simply using the bytes as the string data. It's a very cheap operation, no allocation required.

String::from_utf8(vec: Vec<u8>) -> Result<String, FromUtf8Error>

Converts a vector of bytes to a String. ... This method will take care to not copy the vector, for efficiency’s sake.

Source

The same thing goes for String's method. A String is an owned type: it needs to own the underlying bytes, not just point at someone else's bytes. If it were to take a &[u8] it would have to allocate memory. However, if you already have an owned Vec<u8> then converting from Vec<u8> to String is a cheap operation. String can consume the Vec<u8> and reuse its existing heap buffer. No allocation required.

Explicit heap allocation and copying

Rust wants you to pay attention to memory allocation and copying. Only cheap conversion methods are provided. Any allocation or copying requires an extra method call. It's elegant. The fast path is convenient, the slow path cumbersome. You either need to:

  1. Convert your &[u8] to a &str (cheap) and then convert that to an owned String (expensive); or
  2. Convert your &[u8] to an owned Vec<u8> (expensive) and then convert that to a String (cheap).

Either way, it's your choice, and it requires a second method call.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578