How do I convert a Vector of bytes (u8) to a string?

Question

I am trying to write simple TCP/IP client in Rust and I need to print out the buffer I got from the server.

How do I convert a Vec<u8> (or a &[u8]) to a String?

score 260 · Accepted Answer · edited Sep 30 '22 at 16:49

To convert a slice of bytes to a string slice (assuming a UTF-8 encoding):

use std::str;

//
// pub fn from_utf8(v: &[u8]) -> Result<&str, Utf8Error>
//
// Assuming buf: &[u8]
//

fn main() {

    let buf = &[0x41u8, 0x41u8, 0x42u8];

    let s = match str::from_utf8(buf) {
        Ok(v) => v,
        Err(e) => panic!("Invalid UTF-8 sequence: {}", e),
    };

    println!("result: {}", s);
}

The conversion is in-place, and does not require an allocation. You can create a String from the string slice if necessary by calling .to_owned() on the string slice (other options are available).

If you are sure that the byte slice is valid UTF-8, and you don’t want to incur the overhead of the validity check, there is an unsafe version of this function, from_utf8_unchecked, which has the same behavior but skips the check.

If you need a String instead of a &str, you may also consider String::from_utf8 instead.

The library references for the conversion function:

You may want to add that this is possible because Vec coerces to slices — torkleyy, Apr 13 '17 at 07:49
Although it's true that `from_utf8` doesn't allocate, it may be worth mentioning that it needs to scan the data to validate utf-8 correctness. So this is not an O(1) operation (which one may think at first) — Zargony, Jan 24 '19 at 12:16

score 160 · Answer 2 · edited Aug 24 '16 at 13:38

160

I prefer String::from_utf8_lossy:

fn main() {
    let buf = &[0x41u8, 0x41u8, 0x42u8];
    let s = String::from_utf8_lossy(buf);
    println!("result: {}", s);
}

It turns invalid UTF-8 bytes into � and so no error handling is required. It's good for when you don't need that and I hardly need it. You actually get a String from this. It should make printing out what you're getting from the server a little easier.

Sometimes you may need to use the into_owned() method since it's clone on write.

edited Aug 24 '16 at 13:38

Shepmaster

388,571
95
1,107
1,366

answered Jan 10 '16 at 18:12

Bjorn

69,215
39
136
164

8

Thanks a lot for the `into_owned()` suggestion! Was exactly was I was looking for (this makes it become a proper `String` which you can return as a return value from a method, for example). – Per Lundberg Nov 25 '16 at 16:34
1

� is Unicode U+FFFD (UTF-8 sequence 0xEF 0xBF 0xBD (octal 357 277 275)), '[REPLACEMENT CHARACTER](https://www.utf8-chartable.de/unicode-utf8-table.pl?start=65280)'. In some text editors it can be searched for in regular expression mode by `\x{FFFD}`. – Peter Mortensen Apr 10 '21 at 19:18

score 94 · Answer 3 · answered Jun 01 '16 at 23:45

94

If you actually have a vector of bytes (Vec<u8>) and want to convert to a String, the most efficient is to reuse the allocation with String::from_utf8:

fn main() {
    let bytes = vec![0x41, 0x42, 0x43];
    let s = String::from_utf8(bytes).expect("Found invalid UTF-8");
    println!("{}", s);
}

answered Jun 01 '16 at 23:45

Shepmaster

388,571
95
1,107
1,366

3

Edit: Note that as mentioned by @Bjorn Tipling you might think you can use `String::from_utf8_lossy` instead here, then you don't need the `expect` call, but the input to that is a slice of bytess (`&'a [u8]`). OTOH, there's also `from_utf8_unchecked`. "If you are sure that the byte slice is valid UTF-8, and you don't want to incur the overhead of the conversion, there is an unsafe version of this function [`from_utf8_lossy]`, `from_utf8_unchecked`, which has the same behavior but skips the checks." – James Ray Jan 23 '19 at 09:22
Note that you can use `&vec_of_bytes` to convert back into a slice of bytes, as listed in the examples of `from_utf8_lossy`.https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8_lossy – James Ray Jan 23 '19 at 09:32
@JamesRay is there a way to get the behavior of `from_utf8_lossy` without reallocating? If I start with a `Vec` and then take a reference to it before converting it to a string as in `String::from_utf8_lossy(&my_vec)` I will end up reallocating memory when I don't actually need to. – Michael Dorst Dec 07 '21 at 06:49
Oh nevermind. `from_utf8_lossy` returns a `Cow`, not a String. If there are no invalid characters then it won't reallocate, but if there are it will. – Michael Dorst Dec 07 '21 at 06:55

score 10 · Answer 4 · answered Jun 20 '21 at 04:11

10

In my case I just needed to turn the numbers into a string, not the numbers to letters according to some encoding, so I did

fn main() {
    let bytes = vec![0x41, 0x42, 0x43];
    let s = format!("{:?}", &bytes);
    println!("{}", s);
}

answered Jun 20 '21 at 04:11

PPP

1,279
1
28
71

This is what I have tried, but felt that it might be wrong or something. Perhaps I'll stick to this for now. – MikeTheSapien Oct 04 '21 at 09:17

score 2 · Answer 5 · answered Nov 03 '22 at 17:57

To optimally convert a Vec<u8> possibly containing non-UTF-8 characters/byte sequences into a UTF-8 String without any unneeded allocations, you'll want to optimistically try calling String::from_utf8() then resort to String::from_utf8_lossy().

let buffer: Vec<u8> = ...;

let utf8_string = String::from_utf8(buffer)
    .map_err(|non_utf8| String::from_utf8_lossy(non_utf8.as_bytes()).into_owned())
    .unwrap();

The approach suggested in the other answers will result in two owned buffers in memory even in the happy case (with valid UTF-8 data in the vector): one with the original u8 bytes and the other in the form of a String owning its characters. This approach will instead attempt to consume the Vec<u8> and marshal it as a Unicode String directly and only failing that will it allocate room for a new string containing the lossily UTF-8 decoded output.

score -2 · Answer 6 · answered Jul 17 '23 at 21:54

-2

v.iter().map(|&c| char::from(c)).collect()

answered Jul 17 '23 at 21:54

stach

2,135
2
20
22

This is really, really bad approach. From multiple sides. Much better to use the accepted answer. – Chayim Friedman Jul 18 '23 at 02:10
In what way is it bad? It generates a proper String from bytes, not trying to pretend the bytes are UTF-8 encoded. Very common case. Such data is returned from various external services. Especially technical descriptions of system state. You want to print it, search it for substrings etc. – stach Jul 18 '23 at 08:07
It is both slower than `String::from_utf8()` and does not handles UTF-8 correctly. Yes, sometimes you only need ASCII, but `String::from_utf8()` handles that fine (and faster), as all ASCII is also valid UTF-8. And wanting Unicode _is_ a common need, which this answer dos not handle properly. It will compile, and result in gibberish. – Chayim Friedman Jul 18 '23 at 08:10
When you have some control characters in the binary vector then it might not be a valid UTF-8. You just want those bytes as successive characters not trying to decode anything or interpret. Is it slower? Probably - it has to make an allocation. I'd say you need a proper tool for each case. – stach Jul 18 '23 at 08:13
You have non-ASCII control characters and you want a UTF-8 `String`? This is a really, really rare and weird case. I don't think it deserves an answer on SO. And even if you think it does, it at least deserves a big warning that it does not handle Unicode properly and for 99.9% of the cases you want the accepted answer. – Chayim Friedman Jul 18 '23 at 08:19
I come from Python where it is pretty normal to have a Unicode string with any characters in it. Not a UTF8 encoded stream of bytes, but a Unicode string like: '\u0001\u00FF'. Why can't we have it in Rust? – stach Jul 18 '23 at 08:23
If this is Unicode, even partially, then your solution doesn't work anyway. And in Rust, unlike Python, `String` is guaranteed fully UTF-8. If you want something that is partially UTF-8, leave it as `Vec`. You may use crates like [`bstr`](https://docs.rs/bstr/latest/bstr/) to do various string operations on it. – Chayim Friedman Jul 18 '23 at 08:48

How do I convert a Vector of bytes (u8) to a string?

6 Answers6

Linked

Related