3

When doing some textual data cleaning in R, I can found some special characters. In order to get rid of them, I have to know their unicodes, for example € is \u20AC. I would like to know if it is possible "see" the unicodes with a function that take into account the string within the special character as an input?

John Smith
  • 1,604
  • 4
  • 18
  • 45
  • The following posts [here](http://stackoverflow.com/questions/17761858/converting-a-u-escaped-unicode-string-to-ascii) and [here](http://stackoverflow.com/questions/16028658/unicode-conversion-and-export-in-r) may shed some light on the issue. Also, in an internet search, I came across a package called "Unicode" that may be worth a gander. – lmo Jun 08 '16 at 13:17
  • 1
    you may also try function `iconv` – Cath Jun 08 '16 at 13:17
  • what is the original encoding? – C8H10N4O2 Jun 08 '16 at 13:25
  • the original encoding is utf-8. In console, I can see "é" correctly, however, using iconv, I have "é"; I would like to see "\u00E9". – John Smith Jun 08 '16 at 13:38
  • 2
    Possible duplicate of [Replace accented characters in R with non-accented counterpart (UTF-8 encoding)](http://stackoverflow.com/questions/20495598/replace-accented-characters-in-r-with-non-accented-counterpart-utf-8-encoding) – mik Feb 09 '17 at 15:58

2 Answers2

1
special_char <- "%"
Unicode::as.u_char(utf8ToInt(special_char))
Felix Dietrich
  • 127
  • 1
  • 11
1

Refering to Cath comment, iconv can do the job :

iconv("é", toRaw = TRUE)

Then, you may want to unlist and paste with \u00.

stephLH
  • 131
  • 7