4

content = '5<x<div></div>'

Basically I am looking for a regular expression that will make the string like above into 5&lt;x<div></div>

5x<div></div> will still be 5x<div></div>. I am just trying to escape unclosed html tags

If there is such a library then I will be very happy to use it as long as it meets my main goal of trying to escape unclosed html tags

Dean Christian Armada
  • 6,724
  • 9
  • 67
  • 116
  • Look for it: https://css-tricks.com/snippets/javascript/htmlentities-for-javascript/ – adampweb Dec 13 '20 at 09:36
  • No @AdamP., that is not what I am looking for. That will replace all even html tags. It should only replace unclosed HTML tags making them an actual less than or greater than sign – Dean Christian Armada Dec 13 '20 at 09:38
  • 1
    Well, we all know about [parsing HTML with regex](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)... – trincot Dec 13 '20 at 09:54
  • 4
    Why do you have this ambiguous string in the first place? Fixing ambiguous syntax after the fact is by definition… difficult. – deceze Dec 13 '20 at 09:55
  • It is a requirement. Originally, I am trying to minify HTMLs in NuxtJS Static Generate but it will fail because of that `>` and `<` acting as unclosed HTML tag – Dean Christian Armada Dec 13 '20 at 09:58
  • 1
    Then the “requirement” is somewhat removed from sane reality… – deceze Dec 13 '20 at 09:59
  • So @deceze, it is really that bad :) – Dean Christian Armada Dec 13 '20 at 10:00
  • 3
    Give me a regex answer and I will give you some HTML for which it will fail. Forget it. – trincot Dec 13 '20 at 10:00
  • 3
    You are certainly not the first person to try to minify NuxtJS static HTML. I'd suggest you find the reason for the invalid output instead of trying to fix it symptomatically. – idmean Dec 13 '20 at 10:01
  • 1
    Below is a good solution but I also asked the one who writes those requirements to manually use < and > – Dean Christian Armada Dec 13 '20 at 11:54

1 Answers1

1
  1. Rewrite each open tag character "<" with the symbol + unique value ... in this case ",,#*&,,"
  2. Split the string at the unique value
  3. The "replaceString ()" function checks if the passed value is really a tag ... whether both "<" and ">" characters are present in the string. If not present, rewrite the character with "& lt;".
  4. The whole process is repeated for the symbol ">"

This is not the most beautiful solution to this task but it works.

var str = '5<x<div>s>7</div>';

for (var i = 0; i < 2; i++) {
    if (i === 0) {
        var str2 = str.replace(/</gi, ",,#*&,,<");
        var spl = str2.split(",,#*&,,");
    } else {
        var str2 = str.replace(/>/gi, ">,,#*&,,");
        var spl = str2.split(",,#*&,,");
    }
    replaceString(spl);
}

function replaceString(spl) {
    for (let i = 0; i < spl.length; i++) {
        if (spl[i].indexOf('<') > -1 && spl[i].indexOf('>') > -1) {
            //.......
        } else {
            if (spl[i].indexOf('<') > -1) {
                spl[i] = spl[i].replace(/</gi, "&lt;");
            }
            else if (spl[i].indexOf('>') > -1) {
                spl[i] = spl[i].replace(/>/gi, "&gt;");
            }
        }
    }
    str = spl.join('');
}

console.log(str);
54ka
  • 3,501
  • 2
  • 9
  • 24