I need your help because a have the same error trying with different ways. I want to remove special characters like "áéíóúÁÉÍÓÚýÝ","àèìòùÀÈÌÒÙ""âêîôûÂÊÎÔÛ","ãõÃÕñÑ","äëïöüÄËÏÖÜÿ","çÇ" to "aeiouAEIOUXX","aeiouAEIOU","AEIOUAEIOU","AOAOXX","AEIOUAEIOUX","XX" From a data frame. Thank you!!!
First I tried doing this:
trata<-function(Campo){
Campo<-Campo %>% chartr('ÇÆ£ØÞß&@Ð','XXXXXXXXX',.) %>%
str_to_upper(locale = "es") %>% str_trim(side = "both") %>%
str_replace_all("['´`^]","") %>% chartr('ÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÅÃÕÑ','AEIOUAEIOUAEIOUAEIOUAAOX', .)
return(Campo)
}
trataRS<-function(Campo){
Campo<-Campo %>% chartr('ÇÆ£ØÞßÐ','XXXXXXXXX',.) %>%
str_to_upper(locale = "es") %>% str_trim(side = "both") %>%
str_replace_all("['´`^]","") %>% chartr('ÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÅÃÕ','AEIOUAEIOUAEIOUAEIOUAAO', .)
return(Campo)
}
then I applied these functions to:
Base$paterno_originador<-trata(Base$paterno_originador)
Base$razon_originador <- trataRS(Base$razon_originador)
But I got this ERROR:
Error in chartr("ÇÆ£ØÞßÐ","XXXXXXXXX",.) : invalid input 'HÉCTOR" in 'utftowcs'
So I tried a different way that I found here from @Alexandre_Lima:
rm_accent <- function(str,pattern="all") {
if(!is.character(str))
str <- as.character(str)
pattern <- unique(pattern)
if(any(pattern=="Ç"))
pattern[pattern=="Ç"] <- "ç"
symbols <- c(
acute = "áéíóúÁÉÍÓÚýÝ",
grave = "àèìòùÀÈÌÒÙ",
circunflex = "âêîôûÂÊÎÔÛ",
tilde = "ãõÃÕñÑ",
umlaut = "äëïöüÄËÏÖÜÿ",
cedil = "çÇ"
)
nudeSymbols <- c(
acute = "aeiouAEIOUyY",
grave = "aeiouAEIOU",
circunflex = "AEIOUAEIOU",
tilde = "AOAOXX",
umlaut = "AEIOUAEIOUX",
cedil = "XX"
)
accentTypes <- c("´","`","^","~","¨","ç")
if(any(c("all","al","a","todos","t","to","tod","todo")%in%pattern)) # opcao retirar todos
return(chartr(paste(symbols, collapse=""), paste(nudeSymbols, collapse=""), str))
for(i in which(accentTypes%in%pattern))
str <- chartr(symbols[i],nudeSymbols[i], str)
return(str)
}
But I got a similar ERROR:
Error in chartr(paste(symbols, collapse = ""), paste(nudeSymbols, collapse = ""), :
invalid input 'RUÍZ' in 'utf8towcs'
I write this to show you the encoding. Appears UTF-8 where there is a special character in that column:
Encoding(Base$nombre_originador) [1] "unknown" "UTF-8" "unknown" "UTF-8"