0

This is a small matter if we were doing jQuery, just tag the TR and remove it etc.. But i am curious how one would do this in a strickly regex way.. which manipulating the dom.

I have this:

<tr id="anyName">
    <td>
         // a bunch of html
    </td>
</tr>

How would you use regular expressesions to replace that tr and everything in it to "".

I had this:

sometable.replace(".*?"," ");

I tried a few other things, but I cannot get this to work. Basically, I want the tr gone and all its contents. I am using a server side templating engine that processes all this data prior to coming into the view. So, I can hold this data as a string into this templating logic... and just remove it.. but since its a string, I need the regex to remove it.

Any ideas? I feel I am missing something very simple here.

ikegami
  • 367,544
  • 15
  • 269
  • 518
james emanon
  • 11,185
  • 11
  • 56
  • 97
  • 5
    You can use regular expressions on HTML only if the structure is tightly controlled (no recursive tags, etc). Your question does not specify how strict the structure is. See http://stackoverflow.com/a/1732454/39068 for a general case. – Andrey Shchekin Jul 02 '13 at 02:55
  • 3
    "I have a screw to screw in, but I am curious as to who one would do this using a hammer instead of a screw driver." Unless you can guarantee the HTML follows a very strict pattern, you're asking to write an HTML parser as a regular expression pattern. While it's possible to make a regex-based parser, it's not really possible to parse HTML using a regex. Either way, writing a parser is far beyond the scope of an SO answer. – ikegami Jul 02 '13 at 03:01
  • It should be simpler to omit the not–required markup in the first place, or give it a class that sets it to *display: none*. – RobG Jul 02 '13 at 03:33
  • @ikegami If you bring the hammer down really gently with lots of freuent short taps, and hit the screw straight on, the threads will drive the screw in. –  Jul 02 '13 at 03:57
  • @LegoStormtroopr: I'm sure that's what the OP was thinking too. – Robert Harvey Jul 02 '13 at 15:16

2 Answers2

4

Problem is that <tr> can have nested <tr> tags inside..

And in some cases the <tr> tag doesn't have an ending tag(except XHTML)

So,regex won't be able to remove such tags even if you use odd numbered pattern in regex

So, please Don't use regex for this


Since you are using jquery you can use remove

$('tr').remove();

or you can use some other server side html parser..But don't use regex..It could break your website


To answer your problem you could use this regex

 (?s)<tr[^<>]*>(.(?!<tr[^<>]*>))*?</tr>

or in javascript

 /<tr[^<>]*>(.(?!<tr[^<>]*>))*?<\/tr>/s

demo

But wait.you need to replace it continuously until it replaces all the content

So,you need to use for loop and loop until it replaces all ..

But this regex could break incase there's no closing tag

Anirudha
  • 32,393
  • 7
  • 68
  • 89
  • 1
    +1 Not suggesting to use a regular expression, but… while the closing tag is optional, a regular expression could match everything up to, but not including, the next opening TR. Not sufficient for nested tables though. :-( Use a parser. – RobG Jul 02 '13 at 03:28
0

@Anirudh is correct; this can be a very difficult problem, thanks to nested <tr> tags and unclosed <tr> tags. But assuming you don't have nested or unclosed <tr> tags, you can indeed use regular expressions to solve this problem.

// if your html is in a variable called "html", you can get rid of <tr> tags thusly:
html = html.replace( /<tr[^]*?<\/tr>/, '' );

Note that we can't use .* in between the opening and closing <tr> tag. Why? Because the . wildcard does not match newlines, hence the "match anything" group of [^]. Note also that we do not attempt to match the closing angle bracket in the opening <tr> tag, because there could be attributes in the tag.

If your HTML is well-formed, there can't be anything (other than comments) between your <tr> tags. However, you have to take care if you do want to preserve those comments (and it's a good thing to get in the habit of thinking about in general): if you use the default greedy * quantifier, it'll gobble up everything from the first opening <tr> tag to the last closing one, including comments and any invalid code in between <tr> tags. To fix this problem, make the quantifier lazy by appending a ?.

See my solution (in JavaScript, without jQuery) here:

http://jsfiddle.net/TpeEj/2/

Ethan Brown
  • 26,892
  • 4
  • 80
  • 92