0

The code must check if a given string is found between the tags. But as you can see below, the span tag is built with many other attributes and other css classes that change, it is rather unpredictable.

$body = '<p>Lorem ipsum, lorem ipsum. Lorem ipsum, lorem ipsum. Lorem ipsum, lorem ipsum. Lorem ipsum, lorem ipsum. Lorem ipsum, lorem ipsum.</p><p><span id="subject-47" class="enco-subject enco-subject-post-1" data-id="47">Semencic credits his early familiarity with the breed to his own travels to South Africa<span class="enco-comment-count">4</span></span> , but especially to his frequent correspondence with the head of the first South African Boerboel club, one Mr. Kobus Rust. <strong>The&nbsp;Boerboel Breeders Association was established in 1983</strong> in the Senekal district of the Free State with the sole objective of ennobling and promoting the Boerboel as a unique South African dog breed.</p>';

$body2 = 'We all love South Africa because of its <span class="enco-highlight">beautiful scenery</span>. It is not the cheapest country but blah blah blah.';

$string_to_check = 'South Africa';

So, here is what should be returned:

  • body = the string exists within span.enco-subject tag (but it's a complicated tag..)

  • body2 = the string does not exists within span.enco-subject

Lazhar
  • 1,401
  • 16
  • 37

4 Answers4

0

You might consider using a HTML parser to find the tag you're looking for. This allows you to query your HTML in an object-oriented fashion with rich methods for finding the elements you're targeting. PHP comes with this functionality out of the box:

HTML Parsing with PHP

Official PHP DOM Manual

bideowego
  • 451
  • 1
  • 5
  • 17
0

You can try the following regex:

$string_to_check = 'South Africa';
$regex = '/<span (.*)class="(.*)enco-subject(.*)">(.*)(' . $string_to_check . ')(.*)<\/span>/';
preg_match($regex, $body, $matches);

The result of var_dump($matches):

array(7) {
    [0]=>
        string(212) "Semencic credits his early familiarity with the breed to his own travels to South Africa4"
    [1]=>
        string(16) "id="subject-47" "
    [2]=>
        string(13) "enco-subject "
    [3]=>
        string(20) "-post-1" data-id="47"
    [4]=>
        string(76) "Semencic credits his early familiarity with the breed to his own travels to "
    [5]=>
        string(12) "South Africa"
    [6]=>
        string(41) "4"
}
Chin Leung
  • 14,621
  • 3
  • 34
  • 58
  • It does not work, it gives me wrong matches: Array ( [0] => Array ( [0] => Semencic credits his early familiarity with the breed to his own travels to South Africa4 [1] => 139 ) [1] => Array ( [0] => id="subject-47" [1] => 145 ) [2] => Array ( [0] => enco-subject [1] => 168 ) [3] => Array ( [0] => -post-1" data-id="47 [1] => 193 ) ....... ) – Lazhar May 05 '16 at 19:45
  • There you go, I've updated the preg match. Basically the string_to_check will always be at index 5 accessible via `$matches[5]` – Chin Leung May 05 '16 at 21:11
0

Bottom line answer is, you can't do this with any semblance of reliability with regex, principally because of the nested nature of HTML. Consider:

<span class"special"><span class="otherclass">Some text</span>South Africa</span>

You'd want to match that "South Africa", right? But how is the regular expression know that the first </span> doesn't end the outer span with class="special" ? It doesn't, and there's no way for a regex to consume nested, balanced constructs without some sort of tracking mechanism built-in. (.NET has the balancing groups feature that does this)

Community
  • 1
  • 1
Scott Weaver
  • 7,192
  • 2
  • 31
  • 43
0

I've tackled this differently and built that function:

public function in_subject( $subject, $content ) {

    $result = false;

    $regex = '/enco-subject(.*?)<\/span>/';

    preg_match_all( $regex, $content, $matches);

    foreach ( $matches as $match ) {

        if( !empty($match) ) {

            if( strpos( $match[0], $subject ) > -1 ) {
                $result = true;
            }

        }

    }

    return $result;
}

And it does work!

Lazhar
  • 1,401
  • 16
  • 37