1

I am trying get the string structure produced between a <span> element when it includes the text "Bonkers".

For example:

  <span>    Bonkers </span> 

or

 <span>    Bonkers           </span> 

or

<span>              Bonkers                          </span>

The thing is I don't know span structure, but I know "Bonkers" will be in there.

I want to return the entire text string structure (including the < span > open and close tags so that later I can replace it. eg:

$spanwithbonkers  = '<span>      Bonkers                      </span>';

So far this is what I have, but it does not work:

 <?php

 $homepage = file_get_contents('http://www.example.com/');

  preg_match('/^<span>^Bonkers^</span>/', $homepage, $matches);

  $spanwithbonkers = $matches[0]);


?>

Not sure if preg_match is even supposed to be used.

halfer
  • 19,824
  • 17
  • 99
  • 186
Theo
  • 33
  • 7
  • https://stackoverflow.com/a/1732454/787016 – Kirill Polishchuk Oct 29 '18 at 22:22
  • Like I said, i'm not sure if preg_match is even supposed to be used. I need a solution! – Theo Oct 29 '18 at 22:30
  • If there are no `<` `>` in between, you can use [`[^<]*?\bBonkers\b[^<]*<\/span>`](https://regex101.com/r/QQ0O4X/1/). It uses [negated](https://www.regular-expressions.info/charclass.html#negated) `<` in between opening and closing span tags. If you want to write a crawler for arbitrary html, better [look for a html parser](https://stackoverflow.com/questions/3577641/how-do-you-parse-and-process-html-xml-in-php). – bobble bubble Oct 29 '18 at 23:12
  • Your code returns a var_dump string of: string(33) " Bonkers" I assume that there are 33 empty spaces, but I don't know how to get it to return the < span > tags including the empty spaces – Theo Oct 29 '18 at 23:40
  • Make it an answer so I can give you credit. Although I can't see the empty space, when I used the $spanwithbonkers as the string, it picked it up! – Theo Oct 29 '18 at 23:54

1 Answers1

1

A better way than using regex to parse HTML is using the DOMDocument and DOMXPath classes. You can load the HTML into a document, then use XPath to find specific elements and then process those elements. In your case, we find spans including the word Bonkers by comparing the text value of each span node e.g.

$html = '<body><div><div><span id="b">    Bonkers  </span></div></div>
         <div><span> no bonk</span>
              <span> This is bonkers!</span>
         </div></body>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$spans = $xpath->query("//span[contains(text(), 'Bonkers')]");
foreach ($spans as $span) {
    echo $span->C14N();
}

Output:

<span id="b"> Bonkers </span>

If you want to do a case-insensitive comparison, it's a little more complex:

$spans = $xpath->query('//span');
foreach ($spans as $span) {
    if (stripos($span->textContent, 'Bonkers') !== false) {
        echo $span->C14N() . "\n";
    }
}

Output

<span id="b"> Bonkers </span> 
<span> This is bonkers!</span>

Demo on 3v4l.org

Nick
  • 138,499
  • 22
  • 57
  • 95