0

I have been trying to get the innertext of html tag from a url (defimedia.info) but i get only 1 output. The code i tried is:

$html = file_get_contents("http://www.defimedia.info");
preg_match("'<h3>(.*?)<h3>'si", $html, $match);
echo($match[1]);

even when i try to use foreach or i try to use $match[2], it does not work. Any help would certainly be appreciated.

regards
bhaamb

Bhaamb
  • 91
  • 1
  • 10
  • Maybe using a html parser would be good idea. Your regex will not match h3 if it has a class `

    `

    – Martin Gottweis Nov 23 '16 at 08:03
  • @MartinGottweis Sir it doesnt have a class – Bhaamb Nov 23 '16 at 08:15
  • I would use an HTML parser (Like http://simplehtmldom.sourceforge.net) when parsing HTML instead of using regex, much simpler and easier t use imho. It does all the heavy lifting for you. – Sitethief Nov 23 '16 at 08:24

2 Answers2

2

you need preg_match_all function. Documented here http://php.net/manual/en/function.preg-match-all.php

try like this.

<?php
$html = file_get_contents("http://www.defimedia.info");
preg_match_all('/<h3>(.*?)<h3>/si', $html, $match);
print_r($match);
?>
tanaydin
  • 5,171
  • 28
  • 45
0

Regex is not the correct tool for parsing HTML/XML instead you can use DOMDocument

You can use DOMDocument like as

$html = file_get_contents("http://www.defimedia.info");
$dom = new DOMDocument();

libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);

$h3s = $dom->getElementsByTagName('h3');
foreach ($h3s as $h3) {
    echo $h3->nodeValue."<br>";
}

Why did I used libxml_use_internal_errors(true); ?

Community
  • 1
  • 1
devmyb
  • 375
  • 2
  • 14