1

I'm triying to remove accents in a string, but keeping the letter ñ.

"DIVISIÓN DE MONTAÑA" |> stringi::stri_trans_general("Latin-ASCII") does the job of removing all accents and the letter ñ, resulting in DIVISION DE MONTANA.

Is there other transformation available in stringi, that keeps the letter ñ?

dzegpi
  • 554
  • 5
  • 14

3 Answers3

3
x <- "DIVISIÓN DE MONTAÑA"

paste(sapply(strsplit(x, "")[[1]], function(x) ifelse(x %in% c("Ñ", "ñ"), x, stringi::stri_trans_general(x, "Latin-ASCII"))), collapse = "")

# [1] "DIVISION DE MONTAÑA"
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22
  • This does the job, but it is not what I'm looking for (a transformation in the stringi package or similar). – dzegpi Apr 13 '22 at 15:38
  • "Is there other transformation available in stringi, that keeps the letter ñ?" then my corrected answer would be **NO** there is not. There are alternative ways to convert to ascii or other encodings, but they all work like convert it all, as none will support some sort of exclude list like you want. There are several work arounds, like mine as posted, or the approach Wietse showed in his answer, or you can define your own list with characters you want to convert (you can look them all up) and replace them to your own list of substitutes and replace values from one to the other. – Merijn van Tilborg Apr 13 '22 at 20:26
2

One way to circumvent it is to replace the ñ with a character you don't need before the transliteration and replace them back afterwards like so:

"DIVISIÓN DE MONTAÑñA" |> 
  stringi::stri_replace_all_regex(pattern = c("ñ", "Ñ"), replacement = c("¬", "¤"), vectorize=F) |> 
  stringi::stri_trans_general("Latin-ASCII") |> 
  stringi::stri_replace_all_regex(pattern = c("¬", "¤"), replacement = c("ñ", "Ñ"), vectorize=F)

It's not the prettiest solution tho. And you have to make sure the symbols are not being transliterated themselves. The merit however is that the process is very quick and it will save time compared to a character by character strsplit

koolmees
  • 2,725
  • 9
  • 23
1

I think I came across a similar issue and maybe found a solution using stringi::stri_trans_general that might help you.

My case is a bit simpler as I want to change one of the rules in "Latin-ASCII" (converting å to aa instead of a), whereas you want to ignore one of the rules.

stri_trans_general has the option to use custom transliteration rules. Drawing on inspiration from koolmees' answer, you need to change the ñ to a random character that will not be transliterated by "Latin-ASCII" and then afterwards change it back in the custom rule list.

I am not an expert on this, but more information about custom rules can be found here or here.

# Your case
custom_rules <- "ñ > \\~;
                 Ñ > \\^;
                 ::Latin-ASCII;
                 \\~ > ñ;
                 \\^ > Ñ"

stringi::stri_trans_general(c("DIVISIÓN DE MONTAÑA", "montaña"), id = custom_rules, rules = TRUE)

# My simpler case
my_rules <- "å > aa;
             Å > Aa;
             ::Latin-ASCII;"

stringi::stri_trans_general(c("Århus", "Tårnby"), id = my_rules, rules = TRUE)

This solution also gives me the possibility to maintain a list of rules in another script, source the rules in when needed and use them in one line of stri_trans_general. You also have the possibility to add more rules, e.g. one that replaces hyphens with a space as below.

# Add rule to also replace hyphens with a space
hyphen <- "å > aa;
          '-' > ' ';
          ::Latin-ASCII;"
stringi::stri_trans_general("å-å", id = hyphen, rules = TRUE)
Thranholm
  • 21
  • 2