0

There's a webpage that messes up html code by embedding it within another html code and when I try to replace the entities with their symbol equivalents, it just removes everything.

Here's the code I tried:

var marks = document.getElementsByTagName("body");
      for(var i = 0, l = marks.length; i < l; i++)
      {
         var mark = marks[i];
         mark.innerHTML = mark.innerHTML.replace('&lt;', '<');
         mark.innerHTML = mark.innerHTML.replace('&gt;', '>');
         mark.innerHTML = mark.innerHTML.replace('&amp;', '&');
      }

but here's what it does: https://jsfiddle.net/rkb89odm/2/

Máté Burján
  • 103
  • 1
  • 1
  • 11

1 Answers1

1

First, you're only replacing the first occurrence, not all occurrences.

> '&lt;html&gt;&lt;body&gt;'.replace('&lt;', '<')
< "<html&gt;&lt;body&gt;"
> '&lt;html&gt;&lt;body&gt;'.replace(/&lt;/g, '<')
< "<html&gt;<body&gt;"
> '&lt;html&gt;&lt;body&gt;'.replace(/&lt;/g, '<').replace(/&gt;/g, '>');
< "<html><body>"

See How to replace all occurrences of a string in JavaScript?

Second, I would avoid round-tripping partially valid markup through the browser, as it might not be the same when it comes back to you. Run all the replacements before putting it back in innerHTML.

let s = mark.innerHTML;
s = s.replace(/&lt;/g, '<')
s = s.replace(/&gt;/g, '>');
mark.innerHTML = s;
Josh Lee
  • 171,072
  • 38
  • 269
  • 275