-1

I have some invalidly-nested HTML like:

    <form class="form1" method="get">
    <div>
        <input name="field1">

    </form>

    <form class="form2" method="get">
        <input name="field1">
    </form>

</div>

Yeah, it's a mess, don't ask. The invalid nesting is causing problems somewhere else. jQuery I think is expecting a closing </div>, and only finding it at the last one. It's then treating the second <form> tag as invalid, and also discarding the closing </form> immediately above it, and assuming everything between lines 1 and 9 are one form.

If I output these to the console:

  • $('.form1).html() - all of line 1 - 9
  • $('.form2).html() - undefined

So what I'm trying to do is treat the whole thing as a string, and use regex to strip out form2. I'm expecting a regex something like:

formText.replace(/(<form\b[^>]*>)[^<>]*(<\/form>)/gi, "");

but I'm not sure how to reference the specific form with class=form2.
There's also a problem with it being a multi-line string.

Update: added more detail, outlining why jQuery's remove() method isn't working. jQuery only thinks there's one form unfortunately.

duncan
  • 31,401
  • 13
  • 78
  • 99
  • 3
    [Do not use regex to parse HTML/XML or any other non-regular language](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) –  May 12 '16 at 16:55
  • 1
    Do not fix broken HTML. Just don't allow it to break. – Frederik.L May 12 '16 at 17:25
  • @Frederik.L thanks for that useful insight. – duncan May 12 '16 at 17:26
  • I know you said to not ask, but I think now is a good time to ask. *Why is there invalid HTML?* – 4castle May 12 '16 at 17:34

2 Answers2

2

Don't use regex to parse HTML. Since you're using jQuery, just use .remove():

$(function() {
    $(".form2").remove();
});

JSFiddle

Community
  • 1
  • 1
4castle
  • 32,613
  • 11
  • 69
  • 106
  • That's nice, unfortunately it doesn't work. The HTML is worse than my question initially outlined; I'll update it to show the real problem. It's invalidly nested. so it turns out `$(".form2")` doesn't actually exist as an HTML element. – duncan May 12 '16 at 17:06
  • @duncan [It works for me.](https://jsfiddle.net/mxq4rnyd/2/) Are you sure there's nothing else causing it? Are you putting it in a ready block? – 4castle May 12 '16 at 17:25
  • I think my example HTML isn't accurate enough re: its invalid structure. I need to come up with a better example that actually demonstrates the problem! – duncan May 12 '16 at 17:27
0

I ended up using:

formText = formText.replace(/(<form\b[^>]*form2+.*>[\s\S]+<\/form>)/gi, "");

The [\s\S] matches all characters including \n and \r to cover the newlines.

I could probably have made the part of the regex dealing with the class name more specific so I knew it was the class and not some other random form with a similar, but in practice it didn't matter (there was only one instance of the 2nd form, with a very specific class name).

duncan
  • 31,401
  • 13
  • 78
  • 99
  • How are you acquiring a usable `formText` when the form is being removed from the DOM? – 4castle May 12 '16 at 17:17
  • The 2nd form isn't getting removed from the DOM. When I get `$(.form1).html()` it gives me all of lines 1 - 9 – duncan May 12 '16 at 17:18
  • 1
    Let assume that the HTML is so broken that there is spaces in random, valid places. Like ``. Your regex won't recognize it, while it will show up normally in your DOM and doing a mess. I still believe that dealing with code that *can* break is a practice that will get you into all kind of trouble if anything is scaling around it. I'd do everything to get valid code, even if you need to apply pressure on some third parties. – Frederik.L May 13 '16 at 07:50