As long as there is no 0
UTF-8 code unit in your string, name.as_bytes()
and CString::new(name)?.as_bytes()
should give you exactly the same bytes. Additionally, CString
's .as_bytes_with_null()
will simply append a 0
byte. Here is a little demo with a reasonably complicated UTF-8 string:
use std::ffi::CString;
fn main() {
let message: String = "\nßщ\u{1F601}".to_string();
println!("bytes_1: {:?}", message.as_bytes());
println!("bytes_2: {:?}", CString::new(message.clone()).unwrap().as_bytes());
println!("bytes_3: {:?}", CString::new(message.clone()).unwrap().as_bytes_with_nul());
}
The result is as expected (you might recognize the 10
, which corresponds to the ASCII-character \n
, which is encoded in the same way in UTF-8):
bytes_1: [10, 195, 159, 209, 137, 240, 159, 152, 129]
bytes_2: [10, 195, 159, 209, 137, 240, 159, 152, 129]
bytes_3: [10, 195, 159, 209, 137, 240, 159, 152, 129, 0]
The problem arises if your string contains the U+0000
, which is a valid Unicode code point, is encoded by a single 0
byte in UTF-8
, and can occur in ordinary Strings. For example:
use std::ffi::CString;
fn main() {
let message: String = "\n\u{0000}\n\u{0000}".to_string();
println!("bytes_1: {:?}", message.as_bytes());
println!(
"bytes_2: {:?}",
match CString::new(message.clone()) {
Err(e) => format!("an error: {:?}, as expected", e),
Ok(_) => panic!("won't happen. .as_bytes() must fail."),
}
);
}
will give you
bytes_1: [10, 0, 10, 0]
bytes_2: "an error: NulError(1, [10, 0, 10, 0]), as expected"
So, the simple .as_bytes()
succeeds, but the CString
-version fails. I'd suggest to stick to name.as_bytes()
and UTF-8 if possible, there is no reason to convert it into a CString first.