Regex select text and nearest wrapper

Question

I have this text

<div>another words</div>
<div>
  some text here
</div>

I want to get <div> element which contains 'text' word. result's here:

<div>
  some text here
</div>

I can do it like this:

<div>.*text.*<\/div>

but it selects all text.

What's the context? Regex can be a pretty bad choice of tool to parse HTML if the scope isn't where limited. — Robin, Apr 23 '14 at 12:41
Then regex are definitely not your answer. What language are you using? In most HtML parser this would be both easier and a lot safer to do. — Robin, Apr 23 '14 at 15:06

Taemyr · Accepted Answer · 2014-04-24T08:29:56.093

2

Try

<div>[^<]*text[^<]*<\/div>

To not include tags in the inner part of the match.

Also, regexp is not an ideal tool for parsing html. - Consider if your use case is better served by "proper" html parsing tools.

Edit: If you have nested tags you are definitly leaving the area where regexp is a suitable tool. However you might be able to use negative lookahead;

<div>(.(?<!<div>))*text(.(?<!<div>))*<\/div>

This will misbehave if you need to handle nested div's. And probably in other edge cases, use at own risk.

edited Apr 24 '14 at 08:29

answered Apr 23 '14 at 12:31

Taemyr

3,407
16
26

thanks, it works, but what can i do if div element contains another tags? [b] or [strong] – Wachburn Apr 23 '14 at 13:02
@Wachburn Use tools dedicated to HTML parsing. – Taemyr Apr 24 '14 at 08:21
@Wachburn Answer edited for a possibility if you insist on poking sleeping old ones. – Taemyr Apr 24 '14 at 08:31

score 0 · Answer 2 · answered Apr 23 '14 at 12:49

0

$html = <<< EOF
<div>another words</div>
<div>
  some text here
</div>
EOF;

preg_match('%<div>s+(.*?text.*?)\s+</div>%s', $html, $result);
$result = $result[1];
echo $result;
//some text here

http://ideone.com/qwFlJ8

answered Apr 23 '14 at 12:49

Pedro Lobito

94,083
31
258
268

Regex select text and nearest wrapper

2 Answers2