I am learning Rust and I just have been surprised by the fact that Rust only is able to distinguish UTF-8 byte sequences, but not actual grapheme clusters (i.e. a diacritic is considered as a distinct "char").
So for example, Rust can turn input text to a vector like this (with the help of "नमस्ते".chars()
):
['न', 'म', 'स', '्', 'त', 'े'] // 4 and 6 are diacritics and shouldn't be distinct items
But how do I get a vector like this?
["न", "म", "स्", "ते"]