3

Okay, let's say I have this string:

<div class='box'>i like the world</div><div class='box'>i like my computer</div>

How would I go about echoing the divs that contain the word "world"? Would this envolve some sort of regex?

Thank you so much in advance.

  • Why the downvote? I'm not necessarily asking anyone to code this for me, I'm just asking how I should go about it! –  Dec 26 '12 at 02:45
  • 1
    Regex or http://php.net/manual/en/book.dom.php – Supericy Dec 26 '12 at 02:50
  • Actually it's DOM or some other HTML parser. – PeeHaa Dec 26 '12 at 02:52
  • 1
    @DumbProducts: `I'm not necessarily asking anyone to code this for me, I'm just asking how I should go about it!`, Have a read on the [FAQ on asking questions](http://stackoverflow.com/faq#questions). Questions on "how do I go about it.." in a general term can lead to debates and are not constructive. You should be showing the code you have issues with, list what you have tried and what the expected outcome is. Asking hypotheticals on how to go about things is not good for Q&A and can get questions closed as "not constructive" or "not a real question" – Nope Dec 27 '12 at 01:37

2 Answers2

5

Using DOMDocument and DOMXPath you can easily do this:

<?php

$html = "<div class='box'>i like the world</div><div class='box'>i like my computer</div>";
$doc = new DOMDOcument();
$doc->loadHTML($html);

$xPath = new DOMXPath($doc);
$nodes = $xPath->query("//div[contains(text(),'world')]");

Now $nodes contain all the div elements which contain the word world.

Demo: http://codepad.viper-7.com/Dhalvh

Please note that you don't want to try to parse HTML with regex, because it's a matter of when and not if it is going to break.

Community
  • 1
  • 1
PeeHaa
  • 71,436
  • 58
  • 190
  • 262
  • I have to agree on this being a better answer since I consider it to be much safer and future proof (meaning potential changes to the code will still result in expected behavior while regexes might not guarantee that). –  Dec 26 '12 at 03:08
  • Thank you so much both for your answers! Both answers work, but I'd like to know which one is less resource-hungry, and why regex is bad (I hear EVERYWHERE that regex is bad, but I'd like to know why, eg. where it's be a security threat or would parse). Thank you so much! –  Dec 26 '12 at 03:38
  • @PeeHaa yep i read the last sentence. :) Charles on the answer above ^ clarified even further. Thanks for your post! I marked yours ask correct. –  Dec 26 '12 at 05:46
  • Actually your answer, @PeeHaa is WAY better then the regex solution due to the fact the in the regex solution, they are two strings, but in yours theres one. And that's what I want, one string... Thanks so much for your answer and time! –  Dec 26 '12 at 18:50
1
<?php 
    $html[0] = "<div class='box'>i like the world</div>";
    $html[1] = "<div class='box'>i like my computer</div>";

    foreach ($html as $div) {
        if (preg_match("/world/i", $div)) {
            echo($div);
        }
    }
?>

Yes, a regex would be a convenient way to do it I guess.

marcinx
  • 168
  • 9
  • 1
    "a regex would be an easy way to do it." It is also pretty fragile – PeeHaa Dec 26 '12 at 02:56
  • 2
    Holds the water as long as there is no situation like this `
    No I don't contain what you are looking for
    ` Using a parser is not that hard and its usually way safer.
    –  Dec 26 '12 at 02:59
  • True story, holodoc. Can be resolved by moving the div to the echo output and testing for a match on the div content only. – marcinx Dec 26 '12 at 03:03
  • Still HTML is way to volatile especially if its dynamically generated. Yes DOM parsers can sometimes be pretty resource intensive but they are way safer than regular expressions. –  Dec 26 '12 at 03:05
  • Thank you so much both for your answers! Both answers work, but I'd like to know which one is less resource-hungry, and why regex is bad (I hear EVERYWHERE that regex is bad, but I'd like to know why, eg. where it's be a security threat or would parse). Thank you so much! –  Dec 26 '12 at 03:29
  • @DumbProducts, there are two main problems with regex parsing of HTML: 1) few regex implementations do well with nested pattern matching, and 2) HTML tends to be malformed frequently. It's simply easier to throw the document at the DOM and work from there. Regexes can frequently fail, while firing up the DOM and doing *extensive* work there can be resource intensive and slower. I'd pick the latter, as sanity is better than speed and occasional brokenness. See also http://htmlparsing.com/php.html and http://stackoverflow.com/q/1732348/168868 – Charles Dec 26 '12 at 04:21
  • @Charles thanks for the response! Helped me a lot! :) Oh, and I picked the DOMDocument post as the correct answer, but I upvoted both. –  Dec 26 '12 at 05:49