Regex Matching Help

Question

two<div
class="blogger-post-footer"><img
width='1' height='1'
src='https://blogger.googleusercontent.com/tracker/4997742813462440000-8247376481926663915?l=isthereanyurlnamesleft.blogspot.com'
alt='' /></div>

I need to match from <div to </div>

To be clear, a regex is not the right solution to this problem. — John Leidegren, May 31 '11 at 20:34
I am curious why you say it is not the right answer. I feel like I should be able to pull this post from Blogger without getting the tracker link injected in. — Tegra Detra, May 31 '11 at 20:41
Anyone who atempts to learn regex tries inadvertently to apply it some sort of HTML parsing sooner or later. Regex is simply ill equipped for that kind of work. It's primarily for pattern matching and what your matching against (and given the provided answers) revolve around lazy (as opposed to greedy) matches. The problem with this regex approach (albeit that it works) is that there's no semantic analysis here, what so ever here. That the `` is the right match here is pure luck (no matter how probable that match is). — John Leidegren, May 31 '11 at 21:08
Also, you should read this http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags — John Leidegren, May 31 '11 at 21:12
I have to agree with you that this is a naive approach but for the simple context I am in this is definitely an appropriate hack. — Tegra Detra, May 31 '11 at 22:11

score 1 · Accepted Answer · answered May 31 '11 at 20:33

You can use (<div.*?<\/div>). In the first backreference/group you get the match from <div to </div>.

You must use the /s flag (or an equivalent for the language you use) with this regex to let the . match newlines. Documentation about /s says:

To simplify multi-line substitutions, the "." character never matches a newline unless you use the /s modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't.

score 1 · Answer 2 · answered May 31 '11 at 20:50

1

This should do it:

    <(div*)\b[^>]*>((.|\n)*?)</div>

answered May 31 '11 at 20:50

charlie

89
2

score 0 · Answer 3 · edited May 23 '17 at 12:12

0

You may try the below:

<div.*?</div>

Be sure to have DOTALL or SINGLELINE mode.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS * IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Above message for: RegEx match open tags except XHTML self-contained tags

edited May 23 '17 at 12:12

Community

1
1

answered May 31 '11 at 20:32

manojlds

290,304
63
469
417

But with option dot match newline, which depends on the regex engine you use. – reeaal May 31 '11 at 20:33
@reeaal - hello, why the downvote? I just forgot to mention it. – manojlds May 31 '11 at 20:34
you did not really post a gnu license for your answer? – reeaal May 31 '11 at 20:39
@reeaal - `IN NO EVENT SHALL THE COPYRIGHT OWNER OR * CONTRIBUTORS BE LIABLE` is what I wanted, not really a license :) – manojlds May 31 '11 at 20:44

score 0 · Answer 4 · answered May 31 '11 at 20:47

0

 <(?:([a-zA-Z\?][\w:\-]*)(\s(?:\s*[a-zA-Z][\w:\-]*(?:\s*=(?:\s*"(?:\\"|[^"])*"|\s*'(?:\\'|[^'])*'|[^\s>]+))?)*)?(\s*[\/\?]?)|\/([a-zA-Z][\w:\-]*)\s*|!--((?:[^\-]|-(?!->))*)--|!\[CDATA\[((?:[^\]]|\](?!\]>))*)\]\])>

this a template for extract any html brought from http://gskinner.com/RegExr/

answered May 31 '11 at 20:47

Amir Ismail

3,865
3
20
33

slightly oversized for that problem :) – reeaal May 31 '11 at 20:50
I agree it is oversized but it may help in another big issue. – Amir Ismail May 31 '11 at 21:08

Regex Matching Help

4 Answers4