1
two<div
class="blogger-post-footer"><img
width='1' height='1'
src='https://blogger.googleusercontent.com/tracker/4997742813462440000-8247376481926663915?l=isthereanyurlnamesleft.blogspot.com'
alt='' /></div>

I need to match from <div to </div>

Wessel Kranenborg
  • 1,400
  • 15
  • 38
Tegra Detra
  • 24,551
  • 17
  • 53
  • 78
  • What regex have you tried by yourself? – Wessel Kranenborg May 31 '11 at 20:27
  • To be clear, a regex is not the right solution to this problem. – John Leidegren May 31 '11 at 20:34
  • 1
    I am curious why you say it is not the right answer. I feel like I should be able to pull this post from Blogger without getting the tracker link injected in. – Tegra Detra May 31 '11 at 20:41
  • 1
    Anyone who atempts to learn regex tries inadvertently to apply it some sort of HTML parsing sooner or later. Regex is simply ill equipped for that kind of work. It's primarily for pattern matching and what your matching against (and given the provided answers) revolve around lazy (as opposed to greedy) matches. The problem with this regex approach (albeit that it works) is that there's no semantic analysis here, what so ever here. That the `` is the right match here is pure luck (no matter how probable that match is). – John Leidegren May 31 '11 at 21:08
  • 1
    Also, you should read this http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – John Leidegren May 31 '11 at 21:12
  • I have to agree with you that this is a naive approach but for the simple context I am in this is definitely an appropriate hack. – Tegra Detra May 31 '11 at 22:11

4 Answers4

1

You can use (<div.*?<\/div>). In the first backreference/group you get the match from <div to </div>.

You must use the /s flag (or an equivalent for the language you use) with this regex to let the . match newlines. Documentation about /s says:

To simplify multi-line substitutions, the "." character never matches a newline unless you use the /s modifier, which in effect tells Perl to pretend the string is a single line--even if it isn't.

Wessel Kranenborg
  • 1,400
  • 15
  • 38
1

This should do it:

    <(div*)\b[^>]*>((.|\n)*?)</div>
charlie
  • 89
  • 2
0

You may try the below:

<div.*?</div>

Be sure to have DOTALL or SINGLELINE mode.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS * IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR * CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Above message for: RegEx match open tags except XHTML self-contained tags

Community
  • 1
  • 1
manojlds
  • 290,304
  • 63
  • 469
  • 417
0
 <(?:([a-zA-Z\?][\w:\-]*)(\s(?:\s*[a-zA-Z][\w:\-]*(?:\s*=(?:\s*"(?:\\"|[^"])*"|\s*'(?:\\'|[^'])*'|[^\s>]+))?)*)?(\s*[\/\?]?)|\/([a-zA-Z][\w:\-]*)\s*|!--((?:[^\-]|-(?!->))*)--|!\[CDATA\[((?:[^\]]|\](?!\]>))*)\]\])>

this a template for extract any html brought from http://gskinner.com/RegExr/

Amir Ismail
  • 3,865
  • 3
  • 20
  • 33