1

I want to read all tag attributes with the word title, HTML sample below

<html>
    <head>
        <title> </title>
    </head>
    <body>
        <div title="abc"> </div>
        <div> 
            <span title="abcd"> </span>
        </div>
        <input type="text" title="abcde">
    </body>
</html>

I have tried this regex function, which doesn't work

preg_match('\btitle="\S*?"\b', $html, $matches);
0x5C91
  • 3,360
  • 3
  • 31
  • 46

2 Answers2

2

Just to follow up on my comment, using regex's isn't particularly safe or robust enough to manage HTML (although with some HTML - there is little hope of anything working fully) - have a read of https://stackoverflow.com/a/1732454/1213708.

Using DOMDocument provides a more reliable method, to do the processing you are after you can use XPath and search for any title attributes using //@title (the @ sign is the XPath notation for attribute).

$html = '<html>
<head>
   <title> </title>
</head>
 <body>
   <div title="abc"> </div>
   <div> 
           <span title="abcd"> </span>
   </div>
       <input type="text" title="abcde">
</body>
</html>';

$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);

$xpath = new DOMXPath($doc);

foreach($xpath->query('//@title') as $link) {
    echo $link->textContent.PHP_EOL;
}

which outputs...

abc
abcd
abcde
Nigel Ren
  • 56,122
  • 11
  • 43
  • 55
0

Here's a regex solution

preg_match_all('~\s+title\s*=\s*["\'](?P<title>[^"]*?)["\']~', $html, $matches);
$matches = array_pop($matches);
foreach($matches as $m){
    echo $m . " ";
}
dabingsou
  • 2,469
  • 1
  • 5
  • 8