You can use nchar
for the number of characters
and for the number of bytes
:
nchar("bi\u00dfchen", type="chars")
#[1] 7
nchar("bi\u00dfchen", type="bytes")
#[1] 8
Indeed, in the help, you can find details about how to compute the string size:
The ‘size’ of a character string can be measured in one of three ways (corresponding to the type argument):
bytes: The number of bytes needed to store the string (plus in C a final terminator which is not counted).
chars: The number of human-readable characters.
width: The number of columns cat will use to print the string in a monospaced font. The same as chars if this cannot be calculated.
If you want to know the number of "symbols" inside the string that may (or may not) contain unicode (i.e. without interpreting the unicode symbol), you can use function stri_escape_unicode
from package stringi
:
library(stringi)
nchar(stri_escape_unicode("bi\u00dfchen")) # same as stri_length(stri_escape_unicode("bi\u00dfchen"))
# [1] 12