1

Not sure I have all of my encoding my jargon down right, but...

Let's say you have a character string and you want to replace the non-latin1 characters with its representation in bytes. You do this:

a <- "It’s weird calling a place home when you moved a lot as a kid"
iconv(tweets$text, from = "UTF-8", to = "latin1", sub = "byte")

And get this:

[1] It<e2><80><99>s weird calling a place home when you moved a lot as a kid

Now I want to convert that string back from its encoded version, and in essence return the same string that you had originally. How do you do that?

Jaap
  • 81,064
  • 34
  • 182
  • 193
Christopher Costello
  • 1,186
  • 2
  • 16
  • 30
  • That's not possible in general. How would you know that your original string didn't contain "<80><99>"? If you're willing to assume that, then do the following: split the string into characters, convert the ones that aren't hex, convert again to raw, put the hex into the raw, convert back to a string. Lots of work for a questionable result. – user2554330 Jan 02 '18 at 23:09

0 Answers0