How to find a match skipping the tag using regular expression?

Question

i want to write a regex where it should ignore tag which is present between the string.

e.g., here is my string

<p>hi this is a reg<del>U</del><ins>u</ins>lar expression match</p>

i want a regular expression to find "regular" from above string, the match should find whole word including tags i.e., reg<del>U</del><ins>u</ins>lar

here case can be ignored.

plz help me.. thanks in advance

Gah! [Don't parse HTML with regex](http://stackoverflow.com/a/1732454/383609)! If you insist on using regex, what language are you doing this in? With jQuery, it's trivial to get just the text, for example. — Bojangles, Aug 21 '12 at 08:48
Ignoring the tags would mean that your string becomes `regUular` (because those tags have meaning, you know). So what exactly do you want to ignore? @JamWaffles: This also would need to be taken into account with a jQuery solution, making it nontrivial. — Tim Pietzcker, Aug 21 '12 at 08:50
@TimPietzcker i want to ignore the content which is inside the ~~tag, so my match will be just 'regular' not 'regUular'~~ — Hulk, Aug 21 '12 at 09:08
So is it *just* the `~~` tag you want to ignore with its contents? And all other tags should be treated like they weren't even there?~~ — Tim Pietzcker, Aug 21 '12 at 09:14

Tim Pietzcker · Accepted Answer · 2012-08-21T09:53:48.060

2

I don't think you can get a robust solution in regex. At any rate, it won't be very readable. Here, in verbose form, is a regex that conforms to your revised specifications. Note that it fails to handle <del> tags that contain any nested tags - that's impossible to do with nsregularexpressions.

\b        # Start of word
r         # Match r
(?:       # Match either
 <del>    #  <del>
 [^<>]*   #  any characters besides angle brackets
 </del>   #  </del>
|         # or
 <[^<>]*> #  any other tag
)*        # End of alternation
e         # Match e
(?:<del>[^<>]*</del>|<[^<>]*>)*  # etc...
g
(?:<del>[^<>]*</del>|<[^<>]*>)*
u
(?:<del>[^<>]*</del>|<[^<>]*>)*
l
(?:<del>[^<>]*</del>|<[^<>]*>)*
a
(?:<del>[^<>]*</del>|<[^<>]*>)*
r
\b

edited Aug 21 '12 at 09:53

answered Aug 21 '12 at 08:49

Tim Pietzcker

328,213
58
503
561

this is fine when i am dealing with large string lik the example i given, bt for all the general cases it wont help know.. – Hulk Aug 21 '12 at 09:10
and one more thing is that for me there will be only tags in between the characters, above regex will match all charcters other than tags also.... – Hulk Aug 21 '12 at 09:14
@harish: I've updated the regex. It works on your example and follows your new specs closely. – Tim Pietzcker Aug 21 '12 at 10:00
thanks.. its working fine, i will check this with different combinations and let u know if any prob exists – Hulk Aug 21 '12 at 10:18
hi.. can u check this question http://stackoverflow.com/questions/12474742/skip-xml-content-while-doing-regex-search-and-replace – Hulk Sep 18 '12 at 11:49
i am struggling to find the exact method(or answer) for that post.. plz help me.... – Hulk Sep 18 '12 at 11:50

score 0 · Answer 2 · answered Aug 21 '12 at 08:56

0

You really need some form of HTML parser here. Regexps are unsuited for HTML and you'll spend your time refining and tweaking to try and cover all the edge cases (which you just can't).

answered Aug 21 '12 at 08:56

Brian Agnew

268,207
37
334
440

How to find a match skipping the tag using regular expression?

2 Answers2