2

I like to encode UTF-8 string. In my data the character are separated with = for the function encoding in R I need to separate them with \x

string <- "=2E=30=31=20=52=C3=A9=70=6F=6E=64=65=75=72"
x <-  gsub("=", "\x", string)
Encoding(x)
Encoding(x) <- "latin1"
x

I tried to add one, two, three backslashes to escape. Put in round and square brackets. Add quotes. Put the argument fixed=F. Read here, here and here and still have no clue how to do it.

Expected output:

.01 Répondeur

When I use two backslashes like Wiktor says and check with cat(), there's only one backslash in the output, but it has no effect on encoding(), only when I change it by hand.

Edit:

For example when I do this, it puts two backslashes and endcoding doesn't work:

> gsub("=", "\\x", string, fixed=TRUE)
[1] "\\x2E\\x30\\x31\\x20\\x52\\xC3\\xA9\\x70\\x6F\\x6E\\x64\\x65\\x75\\x72"

The same with the suggestion from Aleksandr Voitov:

> gsub("=", "\\\\x", string)
[1] "\\x2E\\x30\\x31\\x20\\x52\\xC3\\xA9\\x70\\x6F\\x6E\\x64\\x65\\x75\\x72"
Community
  • 1
  • 1
and-bri
  • 1,563
  • 2
  • 19
  • 34

3 Answers3

2
x <- "=2E=30=31=20=52=C3=A9=70=6F=6E=64=65=75=72"  # string data
x <- strsplit(x, "=", useBytes = FALSE )[[1]]       # split string
x <- x[nchar(x) > 0]                               # remove elements with 0 character length

using strtoi

# convert string to integer and convert integer to raw and then to character
rawToChar( as.raw( strtoi(x, base = 16L) ) )                  
# [1] ".01 Répondeur"

?strtoi document page says

hexadecimal constants (prefix 0x or 0X) are interpreted as base 8 and 16

using as.hexmode to cast character to hexadecimal format

rawToChar( as.raw( as.hexmode( x ) ) )
# [1] ".01 Répondeur"
Sathish
  • 12,453
  • 3
  • 41
  • 59
2

You may use gsub("=", "\\x", string, fixed=TRUE) to replace = with \x, and then parse the resulting string:

string <- "=2E=30=31=20=52=C3=A9=70=6F=6E=64=65=75=72"
x <- parse(text = paste0("'", gsub("=", "\\x", string, fixed=TRUE), "'"))
x[[1]]
## => ".01 Répondeur"

See the online R demo.

Here is another solution based on the Unicode package:

> library(Unicode)
> string <- "=2E=30=31=20=52=C3=A9=70=6F=6E=64=65=75=72"
> x1 <- gsub("=", " U+", string, fixed=TRUE)
> y <- unlist(strsplit(trimws(x1), "\\s+"))
> intToUtf8(as.u_char_seq(y))
[1] ".01 Répondeur"

Here, I replaced all = with space+U+, and then split the string with 1+ whitespace symbols after trimming the input. intToUtf8(as.u_char_seq(y)) creates a Unicode string from the Unicode character sequence.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0
string <- "=2E=30=31=20=52=C3=A9=70=6F=6E=64=65=75=72"
x <-  gsub("=", "\\\\x ", string)
Aleksandr
  • 1,814
  • 11
  • 19