When doing some textual data cleaning in R, I can found some special characters. In order to get rid of them, I have to know their unicodes, for example € is \u20AC. I would like to know if it is possible "see" the unicodes with a function that take into account the string within the special character as an input?
Asked
Active
Viewed 3,491 times
3
-
The following posts [here](http://stackoverflow.com/questions/17761858/converting-a-u-escaped-unicode-string-to-ascii) and [here](http://stackoverflow.com/questions/16028658/unicode-conversion-and-export-in-r) may shed some light on the issue. Also, in an internet search, I came across a package called "Unicode" that may be worth a gander. – lmo Jun 08 '16 at 13:17
-
1you may also try function `iconv` – Cath Jun 08 '16 at 13:17
-
what is the original encoding? – C8H10N4O2 Jun 08 '16 at 13:25
-
the original encoding is utf-8. In console, I can see "é" correctly, however, using iconv, I have "é"; I would like to see "\u00E9". – John Smith Jun 08 '16 at 13:38
-
2Possible duplicate of [Replace accented characters in R with non-accented counterpart (UTF-8 encoding)](http://stackoverflow.com/questions/20495598/replace-accented-characters-in-r-with-non-accented-counterpart-utf-8-encoding) – mik Feb 09 '17 at 15:58
2 Answers
1
Refering to Cath comment, iconv
can do the job :
iconv("é", toRaw = TRUE)
Then, you may want to unlist
and paste with \u00
.

stephLH
- 131
- 7