4

I'm trying to convert special characters to ASCII in R. I tried using Hadley's advice in this question:

stringi::stri_trans_general('Jos\xe9', 'latin-ascii')

But I get "Jos�". I'm using stringi v1.1.1.

I'm running a Mac. My friends who are running Windows machines seem to get the desired result of "Jose".

Any idea what is going on?

Community
  • 1
  • 1
Huey
  • 2,714
  • 6
  • 28
  • 34
  • Store the result in a variable. Let's call it `s`. What is the result of `sapply(1:nchar(s), function(i){ return(charToRaw(substr(s, i, i))) })` on Mac? On Windows, the result is a `raw` vector with values `4a 6f 73 65`. –  Jun 20 '16 at 20:09
  • Also, is `"latin-ascii"` included in `stringi::stri_trans_list()`? Does `stringi::stri_trans_general('Jos\xe9', 'Latin-ASCII')` work as you expect? –  Jun 20 '16 at 20:21
  • When running the sapply snippet you mentioned, I get: "Error in nchar(s) : invalid multibyte string, element 1". Using 'Latin-ASCII' instead of 'latin-ascii' doesn't help either. – Huey Jun 20 '16 at 20:40

1 Answers1

10

The default encoding on Windows is different from the typical default encoding on other operating systems (UTF-8). x ='Jos\xe9' means something in Latin1, but not in UTF-8. So, on Linux or OS X you need to tell R what the encoding is:

x ='Jos\xe9'
Encoding(x) <- 'latin1'
stri_trans_general(x, 'Latin-ASCII')
Jota
  • 17,281
  • 7
  • 63
  • 93
Ista
  • 10,139
  • 2
  • 37
  • 38