0

I am learning Rust and I wanted to play around with slices, but this made me "discover" that string literals "o" and "ó" differ in length.

This code:

fn main() {
    let o = String::from("o");
    let oo = String::from("ó");

    println!("o length: {}, ó length: {}", o.len(), oo.len());
}

returns:

o length: 1, ó length: 2

The result of:

println!("{}", String::from("ó"));
println!("{}", "ó");

Is exactly the same.

I am asking this because I am no master of Strings, string literals, bytes and encoding, especially in Rust. Why is the length of o = 1 and of ó = 2?

hc0re
  • 1,806
  • 2
  • 26
  • 61

1 Answers1

3

String::len() returns the length of the string in bytes, not characters. Strings in Rust are UTF-8 encoded, and the character ó requires two bytes to encode in UTF-8.

Note also that there's a few ways to write ó in Unicode. It has its own code point, but there is also a "combining" accent mark that combines with the previous character. Writing ó using that mechanism would contain two code points: the o character (1 byte in UTF-8), and the combining accent mark (2 bytes in UTF-8), so you could also see a length of 3 bytes. (Playground example)

cdhowie
  • 158,093
  • 24
  • 286
  • 300