1

I accidentally converted the columns of Chinese characters in a tab delimited text file to encoded characters. The records are encoded to look like this:

<U+5ECA><U+574A><U+5E02>

How do I convert that to this?

廊坊市

You can recreate the first 6 lines of my data frame in R with this code:

structure(list(City_Code = c(110000L, 110000L, 110000L, 110000L, 110000L, 110000L), Origin_City = c("<U+5ECA><U+574A><U+5E02>", "<U+4FDD><U+5B9A><U+5E02>", "<U+5929><U+6D25><U+5E02>", "<U+5F20><U+5BB6> <U+53E3><U+5E02>", "<U+627F><U+5FB7><U+5E02>", "<U+90AF><U+90F8><U+5E02>"), Origin_Province = c("<U+6CB3><U+5317><U+7701>", "<U+6CB3><U+5317><U+7701>", "<U+5929><U+6D25><U+5E02>", "<U+6CB3><U+5317><U+7701>", "<U+6CB3><U+5317><U+7701>", "<U+6CB3><U+5317><U+7701>"), Destination_City = c("<U+5317><U+4EAC>", "<U+5317><U+4EAC>", "<U+5317><U+4EAC>", "<U+5317><U+4EAC>", "<U+5317<U+4EAC>", "<U+5317><U+4EAC>"), Percentage = c("28.08%", "6.86%", "5.70%", "3.38%", "3.05%", "2.76%"), Date = c("2020-03-13", "2020-03-13", "2020-03-13", "2020-03-13", "2020-03-13", "2020-03-13")), row.names = c("1", "2", "3", "4", "5", "6"), class = "data.frame")
David M
  • 193
  • 2
  • 9
  • Try: `stringi::stri_trans_general(mycol, "zh")`, It would be great if you can provide `dput(head(yourdataframe))`, so that some one can able to help in a proper way – PKumar Mar 17 '20 at 17:04
  • @PKumar if I do this: stringi::stri_trans_general(data$Origin_City[1], "zh") I get this: [1] "" – David M Mar 17 '20 at 18:36
  • @PKumar I added the code you suggested. – David M Mar 17 '20 at 18:44

1 Answers1

1

This code will convert the string to the appropriate Chinese characters:

library(stringi)
string <- '<U+5ECA><U+574A><U+5E02>'
cat(stri_unescape_unicode(gsub("<U\\+(....)>", "\\\\u\\1", string)))
# Output: 廊坊市

Source: Convert unicode to readable characters in R

David M
  • 193
  • 2
  • 9