I want to count the number of "letters" in non-Western languages like Hindi. I put the letters in parenthesis, because if I'm not mistaken, e.g. in Mandarin a character does not necessarily represent a letter, but more like a word.
Anyway, so with Western languages, the following works:
library(stringr)
western_text <- "This is my text"
str_count(tolower(western_text), "[a-z]")
# [1] 12
Now I try the same with a Hindi response:
hindi_text <- "बहुत सी"
str_count(tolower(hindi_text), "[a-z]")
# [1] 0
So question is how I can count the letter equivalent of the Hindi (and potentially other non-Western like Mandarin, Kyrillic...) alphabet(s)?
Update: I guess I will probably need to create some sort of lookup list of all non-Western alphabets to match against?