1

I have searched but cannot find a solution that works. I have tried using DOM but the result is not identical (different spaces and tag elements - minor differences but I need identical for further pattern searches on the source) to the source, hence I would like to try regex. Is this possible (I know it isn't best solution but would like to try it)? For example is it possible to return all of the div class "want-this-entire-div-class" including inner:

$html = '<div class="not-want">
        <div class="also-not-want">
    <div class="want-this-entire-div-class">
<button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button>
<div class="dropdown-menu j-dropdown">
<div class="header-search">
        <input type="text" name="search" value="" placeholder="Search entire site here..." 
class="search-input" data-category_id=""/>
  <button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? 
route=product/search&amp;search="></button>
</div>
</div>
</div>
<div class="not-want-this-also">
<div class="or-this">';

The following stops after the first div>

preg_match('/<div class="want-this-entire-div-class"(.*?)</div>/s', $html, $match); Thanks

3 Answers3

1

One way to tackle this is with a state machine. You enumerate all the possible states, then take action depending on what state you are in. In this case it's

  1. line to ignore
  2. target open div
  3. line to add
  4. extra open div
  5. extra close div
  6. target close div

I don't expect this is robust, but it does work for the given example:

<?php
function inner_div(string $html_s, string $cont_s): string {
   $html_a = explode("\n", $html_s);
   $div_b = false;
   $div_n = 0;
   foreach ($html_a as $tok_s) {
      # state 2: target open div
      if (str_contains($tok_s, 'want-this-entire-div-class')) {
         $div_b = true;
      }
      # state 1: line to ignore
      if (! $div_b) {
         continue;
      }
      # state 3: line to add
      $out_a[] = $tok_s;
      # state 4: extra open div
      if (str_contains($tok_s, '<div')) {
         $div_n++;
      }
      # state 5: extra close div
      if (str_contains($tok_s, '</div>')) {
         $div_n--;
      }
      # state 6: target close div
      if ($div_n == 0) {
         break;
      }
   }
   return implode("\n", $out_a);
}
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Zombo
  • 1
  • 62
  • 391
  • 407
0

Have you thought of using an off the shelf html parsing library? And for context on using regex to parse html RegEx match open tags except XHTML self-contained tags

0

Input

$html = '<div class="not-want">
        <div class="also-not-want">
    <div class="want-this-entire-div-class">
<button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button>
<div class="dropdown-menu j-dropdown">
<div class="header-search">
        <input type="text" name="search" value="" placeholder="Search entire site here..." 
class="search-input" data-category_id=""/>
  <button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? 
route=product/search&amp;search="></button>
</div>
</div>
</div>
<div class="not-want-this-also">
<div class="or-this">';

Code

$document   = new DOMDocument();            // Create DOM object
$document->loadHTML($html);                 // Load html into object
$class_name = "want-this-entire-div-class"; // Set class name to be found
$xpath      = new DomXPath($document);      // Create XPath object
$node = $xpath->query("//div[@class='{$class_name}']")->item(0); // Run query on loaded html
echo $document->saveHTML($node);            // Print result to page

Output

<div class="want-this-entire-div-class">
<button class="dropdown-toggle search-trigger" data-toggle="dropdown"></button>
<div class="dropdown-menu j-dropdown">
<div class="header-search">
        <input type="text" name="search" value="" placeholder="Search entire site here..." class="search-input" data-category_id=""><button type="button" class="search-button" data-search-url="https://www.xxxxcom/index.php? 
route=product/search&amp;search="></button>
</div>
</div>
</div>
Steven
  • 6,053
  • 2
  • 16
  • 28