1

My current regex pattern:

(?s)]*>(.*?)\bsomethin\b(.*?)

HTML fragment to search:

<p>somethin</p>
<p>nuthin</p>

If I run it against the html fragment above it will match <p>somethin</p> which is what I need.

However, if I change the pattern embedded string from "somethin" to "nuthin" it will match both p tags in their entirety when I only want the second tag set. The behavior is the same if I nest the p tags rather than having them on a single line and this is also desired.

Thanks.

bchesley
  • 151
  • 1
  • 3
  • 8
  • I don't understand what you want... What do you mean with "when I only want the second tag set"? Also better use specific libraries to manipulate HTML, do not use regexes. – m0skit0 Nov 08 '11 at 15:43
  • Welcome to why not to parse html with regex tutorial. If you skill is high enough you should write an html parser with regex yourself - from scratch. If not I would *suggest* that you use whatever utility your plattform provides for dealing with html. – FailedDev Nov 08 '11 at 15:50
  • I believe his issue is that if he transposes the somethin and nuthin contents, it will match from the

    before nuthin all the way to the

    after somethin.
    – Asmor Nov 08 '11 at 16:45
  • Asmor is correct. My mention of nesting may have mislead - the tags are NOT nested within each other but my real (i.e., not simplified) use case has tags that run for more than one line before the closing tag. I am attempting to capture the entire tag envelope but get two entire tags in the "nuthin" example - too greedy you know. – bchesley Nov 08 '11 at 22:11

3 Answers3

1

Expression is quite strange and i don't exactly understand what you want. But if you wish take every tag try next regex:

(?s)<(.+?)>\b\w+\b</\1>

Accurate your question, plz.

Zernike
  • 1,758
  • 1
  • 16
  • 26
  • Question is too general. No language, no details. I wrote regex for author's example. Also nothing tell could we use regex or must take another approach. – Zernike Nov 08 '11 at 16:51
  • My target environment is regex find or find in files in the IntelliJ IDE. The suggested pattern did not match anything in my code fragment. – bchesley Nov 08 '11 at 22:13
0

If you want to select only tags with exact string use this: <(\w+).*?>(somethin)<\/\1>

If you want to select tag containig substring use this: <(\w+).*?>.*?(somethin).*?<\/\1>

Maciej Kozieja
  • 1,812
  • 1
  • 13
  • 32
0

Here's what I'd recommend:

(<([^>\s]*)[^>]*>[*<]*somethin[*<]*</\2>)

This won't work if there are nested HTML tags inside your parent element, but otherwise you should be golden.

Asmor
  • 5,043
  • 6
  • 32
  • 42