3

Before voting for close as duplicate please ensure that it does actually answer my particular question here. Questions may look similar, but I haven't found an answer for mine. Thank you.


I am looking for a way to convert arbitrary scalar character into its HTML encoded form. I do not want just encode <, ", etc. but whole text.

So the text of form

"<abc at def.gh>"

be encoded as

"&#x3c;&#x61;&#x62;&#x63;&#x20;&#x61;&#x74;&#x20;&#x64;&#x65;&#x66;&#x2e;&#x67;&#x68;&#x3e;"

My goal is compatibility to how CRAN encodes maintainers email addresses. So the < should not be a &lt; but it should be &#x3c;. Similarly . should not be &period; but it should be &#x2e;.

To see it on CRAN you can visit CRAN page of any package, i.e. https://cran.r-project.org/package=curl, then "view source" and find Maintainer field there.

I am looking for a lightweight solution that will require as few dependencies as possible, it doesn't have to be fast.

For reference, an online tool to decode encoded string: https://onlineasciitools.com/convert-html-entities-to-ascii

jangorecki
  • 16,384
  • 4
  • 79
  • 160

1 Answers1

3

Here is something quick (not thoroughly tested). It was inspired by another SO answer.

foo <- function(x) {
  splitted <- strsplit(x, "")[[1]]
  intvalues <- as.hexmode(utf8ToInt(enc2utf8(x)))
  paste(paste0("&#x", intvalues, ";"), collapse = "")
}

all.equal(
  foo("<abc at def.gh>"),
  "&#x3c;&#x61;&#x62;&#x63;&#x20;&#x61;&#x74;&#x20;&#x64;&#x65;&#x66;&#x2e;&#x67;&#x68;&#x3e;"
)
# [1] TRUE
s_baldur
  • 29,441
  • 4
  • 36
  • 69