Is there a differencee between str::as_bytes and CString::as_bytes_with_nul?

Question

Is there any difference between doing this:

name.as_bytes()

and this:

CString::new(name)?.as_bytes_with_nul()

I want to get the bytes from name (which is String) in a way that I can easily send them over the network, and I am not sure whether CString is even necessary here.

score 5 · Answer 1 · answered Feb 09 '19 at 18:01

The documentation of as_bytes_with_nul starts with:

Equivalent to the as_bytes function except that the returned slice includes the trailing nul terminator.

While as_bytes is:

The returned slice does not contain the trailing nul terminator

(emphasis in the original quote)

It is up to you whether you need to transfer the nul byte, and this depends on how you send data over the network (TCP/UDP? raw binary data over TCP? if so, how do you intend to separate messages? JSON? etc.).

I should have looked more closely at the docs ... thank you for your time — laptou, Feb 09 '19 at 18:03

Andrey Tyukin · Answer 2 · 2019-02-09T19:18:26.447

As long as there is no 0 UTF-8 code unit in your string, name.as_bytes() and CString::new(name)?.as_bytes() should give you exactly the same bytes. Additionally, CString's .as_bytes_with_null() will simply append a 0 byte. Here is a little demo with a reasonably complicated UTF-8 string:

use std::ffi::CString;

fn main() {
    let message: String = "\nßщ\u{1F601}".to_string();
    println!("bytes_1: {:?}", message.as_bytes());
    println!("bytes_2: {:?}", CString::new(message.clone()).unwrap().as_bytes());
    println!("bytes_3: {:?}", CString::new(message.clone()).unwrap().as_bytes_with_nul());
}

The result is as expected (you might recognize the 10, which corresponds to the ASCII-character \n, which is encoded in the same way in UTF-8):

bytes_1: [10, 195, 159, 209, 137, 240, 159, 152, 129]
bytes_2: [10, 195, 159, 209, 137, 240, 159, 152, 129]
bytes_3: [10, 195, 159, 209, 137, 240, 159, 152, 129, 0]

The problem arises if your string contains the U+0000, which is a valid Unicode code point, is encoded by a single 0 byte in UTF-8, and can occur in ordinary Strings. For example:

use std::ffi::CString;

fn main() {
    let message: String = "\n\u{0000}\n\u{0000}".to_string();
    println!("bytes_1: {:?}", message.as_bytes());
    println!(
        "bytes_2: {:?}",
        match CString::new(message.clone()) {
            Err(e) => format!("an error: {:?}, as expected", e),
            Ok(_) => panic!("won't happen. .as_bytes() must fail."),
        }
    );
}

will give you

bytes_1: [10, 0, 10, 0]
bytes_2: "an error: NulError(1, [10, 0, 10, 0]), as expected"

So, the simple .as_bytes() succeeds, but the CString-version fails. I'd suggest to stick to name.as_bytes() and UTF-8 if possible, there is no reason to convert it into a CString first.

Is there a differencee between str::as_bytes and CString::as_bytes_with_nul?

2 Answers2