Even if the HTML entities are already in that string, one way or another, you need to replace them with their actual character or their escape notation equivalent.
If they were not in the string already, one option would be to just look them up:
Or calculate them:
Or, if you can type or copy-paste the original character from somewhere else, you can get its decimal Unicode code using String.prototype.charCodeAt()
, which returns the UTF-16 decimal code unit at the given index, and Number.prototype.toString()
, using its radix
parameter to convert that decimal to hexadecimal:
'°'.charCodeAt(0); // 176
'°'.charCodeAt(0).toString(16); // "b0"
And then use the escape notation to represent them with their Unicode code. Note that depending on the code, we use the \uXXXX
or the \xXX
notation:
const str = `\u200C \xA0 \xB0 \u2103`;
console.log(str);
console.log(str.split(' ').map(s => `${ s.charCodeAt(0) } = ${ s.charCodeAt(0).toString(16) }`));
In your case, you need to parse that string, extract the entities and replace them with the actual character they represent.
I've made this snippet so that you can just paste characters or write HTML entities and get their Unicode codes, but this will also serve you as an example on how to dynamically parse those HTML entities:
const sandbox = document.getElementById('sandbox');
const input = document.getElementById('input');
const list = document.getElementById('list');
function parseInput() {
let text = input.value;
(text.match(/&.+;/ig) || []).forEach(entity => {
// Insert the HTML entity as HTML in an HTML element:
sandbox.innerHTML = entity;
// Retrieve the HTML elements innerText to get the parsed entity (the actual character):
text = text.replace(entity, sandbox.innerText);
});
list.innerHTML = text.split('').map(char => {
const dec = char.charCodeAt(0);
const hex = dec.toString(16).toUpperCase();
const code = hex.length === 2 ? `\\x${ hex }` : `\\u${ hex }`;
const link = `0000${ code }`.slice(-Math.min(4, hex.length ));
return `
<li>
<div>${ char }</div>
<div>${ dec }</div>
<div>${ hex }</div>
<div><a href="http://www.fileformat.info/info/unicode/char/${ link }">${ code }</a></div>
</li>
`;
}).join('');
}
input.value = '‌ °℃';
input.oninput = parseInput;
parseInput();
body {
margin: 0;
padding: 8px;
font-family: monospace;
}
#input {
margin-bottom: 16px;
border-radius: 2px;
border: 0;
padding: 8px;
font-family: monospace;
font-size: 16px;
font-weight: bold;
box-shadow: 0 0 32px rgba(0, 0, 0, .25);
width: 100%;
box-sizing: border-box;
height: 40px;
outline: none;
}
#sandbox {
display: none;
}
#list {
list-style: none;
margin: 0;
padding: 0;
border-top: 1px solid #EEE;
}
#list > li {
display: flex;
border-bottom: 1px solid #EEE;
}
#list > li > div {
width: 25%;
box-sizing: border-box;
padding: 8px;
}
#list > li > div + div {
border-left: 1px solid #EEE;
}
<div id="sandbox"></div>
<input type="text" id="input" />
<ul id="list"></ul>