251

I am trying to write simple TCP/IP client in Rust and I need to print out the buffer I got from the server.

How do I convert a Vec<u8> (or a &[u8]) to a String?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Athabaska Dick
  • 3,855
  • 3
  • 20
  • 22

6 Answers6

260

To convert a slice of bytes to a string slice (assuming a UTF-8 encoding):

use std::str;

//
// pub fn from_utf8(v: &[u8]) -> Result<&str, Utf8Error>
//
// Assuming buf: &[u8]
//

fn main() {

    let buf = &[0x41u8, 0x41u8, 0x42u8];

    let s = match str::from_utf8(buf) {
        Ok(v) => v,
        Err(e) => panic!("Invalid UTF-8 sequence: {}", e),
    };

    println!("result: {}", s);
}

The conversion is in-place, and does not require an allocation. You can create a String from the string slice if necessary by calling .to_owned() on the string slice (other options are available).

If you are sure that the byte slice is valid UTF-8, and you don’t want to incur the overhead of the validity check, there is an unsafe version of this function, from_utf8_unchecked, which has the same behavior but skips the check.

If you need a String instead of a &str, you may also consider String::from_utf8 instead.

The library references for the conversion function:

BinaryButterfly
  • 18,137
  • 13
  • 50
  • 91
gavinb
  • 19,278
  • 3
  • 45
  • 60
  • You may want to add that this is possible because Vec coerces to slices – torkleyy Apr 13 '17 at 07:49
  • 4
    Although it's true that `from_utf8` doesn't allocate, it may be worth mentioning that it needs to scan the data to validate utf-8 correctness. So this is not an O(1) operation (which one may think at first) – Zargony Jan 24 '19 at 12:16
160

I prefer String::from_utf8_lossy:

fn main() {
    let buf = &[0x41u8, 0x41u8, 0x42u8];
    let s = String::from_utf8_lossy(buf);
    println!("result: {}", s);
}

It turns invalid UTF-8 bytes into � and so no error handling is required. It's good for when you don't need that and I hardly need it. You actually get a String from this. It should make printing out what you're getting from the server a little easier.

Sometimes you may need to use the into_owned() method since it's clone on write.

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Bjorn
  • 69,215
  • 39
  • 136
  • 164
  • 8
    Thanks a lot for the `into_owned()` suggestion! Was exactly was I was looking for (this makes it become a proper `String` which you can return as a return value from a method, for example). – Per Lundberg Nov 25 '16 at 16:34
  • 1
    � is Unicode U+FFFD (UTF-8 sequence 0xEF 0xBF 0xBD (octal 357 277 275)), '[REPLACEMENT CHARACTER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280)'. In some text editors it can be searched for in regular expression mode by `\x{FFFD}`. – Peter Mortensen Apr 10 '21 at 19:18
94

If you actually have a vector of bytes (Vec<u8>) and want to convert to a String, the most efficient is to reuse the allocation with String::from_utf8:

fn main() {
    let bytes = vec![0x41, 0x42, 0x43];
    let s = String::from_utf8(bytes).expect("Found invalid UTF-8");
    println!("{}", s);
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
  • 3
    Edit: Note that as mentioned by @Bjorn Tipling you might think you can use `String::from_utf8_lossy` instead here, then you don't need the `expect` call, but the input to that is a slice of bytess (`&'a [u8]`). OTOH, there's also `from_utf8_unchecked`. "If you are sure that the byte slice is valid UTF-8, and you don't want to incur the overhead of the conversion, there is an unsafe version of this function [`from_utf8_lossy]`, `from_utf8_unchecked`, which has the same behavior but skips the checks." – James Ray Jan 23 '19 at 09:22
  • Note that you can use `&vec_of_bytes` to convert back into a slice of bytes, as listed in the examples of `from_utf8_lossy`.https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy – James Ray Jan 23 '19 at 09:32
  • @JamesRay is there a way to get the behavior of `from_utf8_lossy` without reallocating? If I start with a `Vec` and then take a reference to it before converting it to a string as in `String::from_utf8_lossy(&my_vec)` I will end up reallocating memory when I don't actually need to. – Michael Dorst Dec 07 '21 at 06:49
  • Oh nevermind. `from_utf8_lossy` returns a `Cow`, not a String. If there are no invalid characters then it won't reallocate, but if there are it will. – Michael Dorst Dec 07 '21 at 06:55
10

In my case I just needed to turn the numbers into a string, not the numbers to letters according to some encoding, so I did

fn main() {
    let bytes = vec![0x41, 0x42, 0x43];
    let s = format!("{:?}", &bytes);
    println!("{}", s);
}
PPP
  • 1,279
  • 1
  • 28
  • 71
2

To optimally convert a Vec<u8> possibly containing non-UTF-8 characters/byte sequences into a UTF-8 String without any unneeded allocations, you'll want to optimistically try calling String::from_utf8() then resort to String::from_utf8_lossy().

let buffer: Vec<u8> = ...;

let utf8_string = String::from_utf8(buffer)
    .map_err(|non_utf8| String::from_utf8_lossy(non_utf8.as_bytes()).into_owned())
    .unwrap();

The approach suggested in the other answers will result in two owned buffers in memory even in the happy case (with valid UTF-8 data in the vector): one with the original u8 bytes and the other in the form of a String owning its characters. This approach will instead attempt to consume the Vec<u8> and marshal it as a Unicode String directly and only failing that will it allocate room for a new string containing the lossily UTF-8 decoded output.

Mahmoud Al-Qudsi
  • 28,357
  • 12
  • 85
  • 125
-2

v.iter().map(|&c| char::from(c)).collect()

stach
  • 2,135
  • 2
  • 20
  • 22
  • This is really, really bad approach. From multiple sides. Much better to use the accepted answer. – Chayim Friedman Jul 18 '23 at 02:10
  • In what way is it bad? It generates a proper String from bytes, not trying to pretend the bytes are UTF-8 encoded. Very common case. Such data is returned from various external services. Especially technical descriptions of system state. You want to print it, search it for substrings etc. – stach Jul 18 '23 at 08:07
  • It is both slower than `String::from_utf8()` and does not handles UTF-8 correctly. Yes, sometimes you only need ASCII, but `String::from_utf8()` handles that fine (and faster), as all ASCII is also valid UTF-8. And wanting Unicode _is_ a common need, which this answer dos not handle properly. It will compile, and result in gibberish. – Chayim Friedman Jul 18 '23 at 08:10
  • When you have some control characters in the binary vector then it might not be a valid UTF-8. You just want those bytes as successive characters not trying to decode anything or interpret. Is it slower? Probably - it has to make an allocation. I'd say you need a proper tool for each case. – stach Jul 18 '23 at 08:13
  • You have non-ASCII control characters and you want a UTF-8 `String`? This is a really, really rare and weird case. I don't think it deserves an answer on SO. And even if you think it does, it at least deserves a big warning that it does not handle Unicode properly and for 99.9% of the cases you want the accepted answer. – Chayim Friedman Jul 18 '23 at 08:19
  • I come from Python where it is pretty normal to have a Unicode string with any characters in it. Not a UTF8 encoded stream of bytes, but a Unicode string like: '\u0001\u00FF'. Why can't we have it in Rust? – stach Jul 18 '23 at 08:23
  • If this is Unicode, even partially, then your solution doesn't work anyway. And in Rust, unlike Python, `String` is guaranteed fully UTF-8. If you want something that is partially UTF-8, leave it as `Vec`. You may use crates like [`bstr`](https://docs.rs/bstr/latest/bstr/) to do various string operations on it. – Chayim Friedman Jul 18 '23 at 08:48