I have an R function which tries to capitalise the first letter of every "word"
proper = function(x){
gsub("(?<=\\b)([[:alpha:]])", "\\U\\1", x, perl = TRUE)
}
This works pretty well, but when I have a word with a Māori macron in it like Māori
I get improper capitalisation, e.g.
> proper("Māori")
[1] "MāOri"
Clearly the RE engine thinks the macron ā
is a word boundary. Not sure why.