0

I have this text

<div>another words</div>
<div>
  some text here
</div>

I want to get <div> element which contains 'text' word. result's here:

<div>
  some text here
</div>

I can do it like this:

<div>.*text.*<\/div>

but it selects all text.

MaVRoSCy
  • 17,747
  • 15
  • 82
  • 125
Wachburn
  • 2,842
  • 5
  • 36
  • 59
  • What's the context? Regex can be a pretty bad choice of tool to parse HTML if the scope isn't where limited. – Robin Apr 23 '14 at 12:41
  • @Robin i have very dirty html, without closed tag etc. – Wachburn Apr 23 '14 at 13:06
  • Then regex are definitely not your answer. What language are you using? In most HtML parser this would be both easier and a lot safer to do. – Robin Apr 23 '14 at 15:06

2 Answers2

2

Try

<div>[^<]*text[^<]*<\/div>

To not include tags in the inner part of the match.

Also, regexp is not an ideal tool for parsing html. - Consider if your use case is better served by "proper" html parsing tools.

Edit: If you have nested tags you are definitly leaving the area where regexp is a suitable tool. However you might be able to use negative lookahead;

<div>(.(?<!<div>))*text(.(?<!<div>))*<\/div>

This will misbehave if you need to handle nested div's. And probably in other edge cases, use at own risk.

Taemyr
  • 3,407
  • 16
  • 26
0
$html = <<< EOF
<div>another words</div>
<div>
  some text here
</div>
EOF;

preg_match('%<div>s+(.*?text.*?)\s+</div>%s', $html, $result);
$result = $result[1];
echo $result;
//some text here

http://ideone.com/qwFlJ8

Pedro Lobito
  • 94,083
  • 31
  • 258
  • 268