I am trying to unescape certain html entities and for that I modified a short piece of code I found somewhere on SO (that does the opposite). Now the problem I have is that Hex and Dec representations of escaped characters seem to be allowed to have an arbitrary amount of leading zeros. Naturally I want to exclude these zeros in order to lookup the same character whether it's &x0022;
or just &x22;
. Currently my code looks like this:
var entityMapRev = {
'amp': '&',
'lt': '<',
'gt': '>',
'quot': '"',
'#34': '"',
'#38': '&',
'#39': '\'',
'#60': '<',
'#62': '>',
'#91': '[',
'#93': ']',
'#x22': '"',
'#x26': '&',
'#x27': '\'',
'#x2F': '/',
'#x60': '`',
'#x3C': '<',
'#x3D': '=',
'#x3E': '>',
'#x5B': '[',
'#x5D': ']',
};
function unescapeHtml (string) {
return String(string).replace(/�*(\d+?);/g, function (s,p1) {
return entityMapRev['#'+p1];
}).replace(/�*(\d+?);/g, function (s,p1) {
return entityMapRev['#x'+p1];
}).replace(/&(.+?);/g, function (s,p1) {
return entityMapRev[p1];
});
}
As you can see, first it tries the decimal representations, then hex, then written out.
Now my question is, can I do that in a single regex? I know how to make on that matches either form, but then I have the problem that the groups still include the zeros which doesn't work with my current lookup.
If that is not possible, does anyone know a better way to do this?