0

So I am using regex101.com to test my string and I can't get the output I need. The sample I made can be viewed here https://regex101.com/r/YQTW4c/2.

So my regex is this:

<table class=\"datatable\s\">(.*?)<\/table>

and the sample string:

<table class="datatable"><thead><tr><tr></thead></table>

I want to get the everything inside the table class datatable which, in this example, is <thead><tr><tr></thead>.

Am I missing something here? Any help would be much appreciated.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
wobsoriano
  • 12,348
  • 24
  • 92
  • 162

2 Answers2

1

Your problem (as described by regex101) is that

"\s matches any whitespace character (equal to [\r\n\t\f\v ])"

So your regex requires a whitespace character between the e in datatable and the ", which doesn't exist. If you want to allow for zero or more spaces between that e and the ", you need to change your regex to

<table class=\"datatable\s*\">(.*?)<\/table>

Note that escaping " in regex's is not necessary (but I presume they are there because your regex is a quoted string).

What others have been saying about not using regex to parse HTML is very true; for example this regex will fail if two tables with class "datatable" are nested. It will also fail if a datatable is instantiated with additional classes. It is far better to use PHP tools built for the purpose.

Nick
  • 138,499
  • 22
  • 57
  • 95
1

Very, very often do volunteers urge developers to use DomDocument, but very, very seldom does anyone actually code up a working solution. ...so I will offer a solution that uses DomDocument and XPath.

The table tag is targeted using its class and item(0) is its first child. saveHTML() is how you extract the data.

Code: (Demo)

$html = <<<HTML
<table class="datatable"><thead><tr><tr></thead></table>
HTML;

$dom=new DOMDocument; 
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$node = $xpath->evaluate("//table[contains(@class, 'datatable')]/*")->item(0);
echo $dom->saveHTML($node);

Output:

<thead>
<tr></tr>
<tr></tr>
</thead>

*Notice that the output dom is "corrected" with the inclusion of closing </tr> tags.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136