0

I am trying to unescape certain html entities and for that I modified a short piece of code I found somewhere on SO (that does the opposite). Now the problem I have is that Hex and Dec representations of escaped characters seem to be allowed to have an arbitrary amount of leading zeros. Naturally I want to exclude these zeros in order to lookup the same character whether it's &x0022; or just &x22;. Currently my code looks like this:

var entityMapRev = {
    'amp': '&',
    'lt': '<',
    'gt': '>',
    'quot': '"',
    '#34': '"',
    '#38': '&',
    '#39': '\'',
    '#60': '<',
    '#62': '>',
    '#91': '[',
    '#93': ']',
    '#x22': '"',
    '#x26': '&',
    '#x27': '\'',
    '#x2F': '/',
    '#x60': '`',
    '#x3C': '<',
    '#x3D': '=',
    '#x3E': '>',
    '#x5B': '[',
    '#x5D': ']',
};
function unescapeHtml (string) {
    return String(string).replace(/&#0*(\d+?);/g, function (s,p1) {
        return entityMapRev['#'+p1];
    }).replace(/&#x0*(\d+?);/g, function (s,p1) {
        return entityMapRev['#x'+p1];
    }).replace(/&(.+?);/g, function (s,p1) {
        return entityMapRev[p1];
    });
}

As you can see, first it tries the decimal representations, then hex, then written out.

Now my question is, can I do that in a single regex? I know how to make on that matches either form, but then I have the problem that the groups still include the zeros which doesn't work with my current lookup.

If that is not possible, does anyone know a better way to do this?

Consti P
  • 445
  • 5
  • 11
  • Is using regex, if doable with a single regex, *required*? It's possible (just expand the replacer function's logic, basically), but there's a much, much simpler way https://stackoverflow.com/questions/1912501/unescape-html-entities-in-javascript – CertainPerformance Apr 25 '19 at 07:01
  • @CertainPerformance Interesting link, maybe I'll look into that. And no, no single regex required, I just thought that 3 chained regex were not exactly best for performance... – Consti P Apr 25 '19 at 07:22
  • Never worry about performance until you've actually noticed something seems too laggy and have run a performance test to identify the bottlenecks - computers can process millions of instructions per second, after all. Clean and readable code is more important 99% of the time. (still, refactoring into a single regex *would* look nicer and DRY-er, but using `DOMParser` is even better, no need to reinvent the wheel) – CertainPerformance Apr 25 '19 at 07:24

0 Answers0