Convert html decoded string to human readable string in nashorn

Question

I have some strings like this(encoded as utf-8):

توسعه.

I want to convert them to:

توسعه

How can I do that in javascript?

The solution needs to be compatible with nashorn, since I am running the code in a virtual engine in java.

NOTE: None of these HTML Entity Decode, Unescape HTML entities in Javascript? are acceptable for my question, since they do not work in nashorn.

P.S: I have searched for possible solutions, and it was suggested by many to use decodeURIComponent(escape(window.atob(yourString))) (with slight differences), which apparently does not work, as I have tried them in vscode(javascript).

score 0 · Answer 1 · answered Mar 19 '20 at 17:53

0

Unclear if nashorn supports DOM methods, but typically you can do

var x = '&#x62A;&#x648;&#x633;&#x639;&#x647;'
var y = document.createElement("div")
y.innerHTML = x;
console.log(y.textContent)

answered Mar 19 '20 at 17:53

epascarello

204,599
20
195
236

unfortunately, this does not work. – Arman Mar 20 '20 at 05:59

Arman · Accepted Answer · 2020-03-21T07:47:02.510

The string I mentioned in the question can be broke down to smaller parts separated by ;. Each part, is a combination of &# and a hex number(e.gx62A) corresponding to a character(ت).

Following code will do the job, by parsing input str and finding corresponding characters. The result is concatenation of characters.

human_readable = function (str) {
            hex_code = str.match(/([^&#]+[\w][^;])|(\s)/g)
            s = ''
            for (j = 0; j < hex_code.length; j++) {
                if (hex_code[j] != ' ') {
                    int_code = parseInt("0" + hex_code[j])
                    char = String.fromCharCode(int_code)
                } else {
                char = ' '
                }
                s = s + char
                }
            return s
        }

console.log(human_readable('&#x62A;&#x648;&#x633;&#x639;&#x647;'))

P.S: I have assumed that if str contains white spaces, it will be simply ' ', and not the corresponding unicode.

Convert html decoded string to human readable string in nashorn

2 Answers2