0

I have vector with names of countries such as x

x <- c("c\u00f4te", "côte")

showNonASCII(x)
1: c<c3><b4>te
2: c<f4>te


iconv(x, to="ASCII//TRANSLIT")
[1] "cA?te" "cote" 

Encoding(x)
[1] "UTF-8"  "latin1"

I would like to unify them, so how can I use str_replace to convert \u00f4 to ô. and convert x elements to latin1?

Tom Blodget
  • 20,260
  • 3
  • 39
  • 72
Mohamed
  • 95
  • 7

1 Answers1

0

Checking the encoding of the vector elements:

x <- c("c\u00f4te", "côte", , "cote")
sapply(x, Encoding, USE.NAMES = TRUE)

I get a mix. So, iconv can't be called on the whole vector due to having a fixed "from" encoding.

stringi does provide a nice wrapper function that both adapts to different "from" encodings and transliterates to ASCII.

stringi::stri_trans_general(x, "latin-ascii")
Tom Blodget
  • 20,260
  • 3
  • 39
  • 72
  • Good answer. what if I want all vector's elements to become latin1 or UTF-8. ie replace to "c\u00f4te", "cote" to become "côte" or "côte" , "cote" to become "c\u00f4te" – Mohamed Mar 23 '18 at 15:11
  • @Mohamed How would any kind of conversion know that "cote" is more properly written in French as "côte"? – Tom Blodget Mar 23 '18 at 16:14
  • I work on data of 57 countries one of Côte d'Ivoire and Réunion. I do my calculation on R, then I export the dataframe (one column is country and other columns are indicator, and years) using write.csv. However when I open the file in excel, c\u00f4 become A with telda. So I want to export in latin1 with côte not cote. – Mohamed Mar 23 '18 at 16:37
  • I came through an article that \x is used for letters in latin1. https://stackoverflow.com/questions/37930717/converting-accents-to-ascii-in-r so \x00f4 is ô in latin1.... – Mohamed Mar 23 '18 at 17:13