PHP - preg_match unable to get all elements from html url

Question

I have been trying to get the innertext of html tag from a url (defimedia.info) but i get only 1 output. The code i tried is:

$html = file_get_contents("http://www.defimedia.info");
preg_match("'<h3>(.*?)<h3>'si", $html, $match);
echo($match[1]);

even when i try to use foreach or i try to use $match[2], it does not work. Any help would certainly be appreciated.

regards
bhaamb

Maybe using a html parser would be good idea. Your regex will not match h3 if it has a class `
` — Martin Gottweis, Nov 23 '16 at 08:03
I would use an HTML parser (Like http://simplehtmldom.sourceforge.net) when parsing HTML instead of using regex, much simpler and easier t use imho. It does all the heavy lifting for you. — Sitethief, Nov 23 '16 at 08:24

tanaydin · Accepted Answer · 2016-11-23T08:19:29.077

2

you need preg_match_all function. Documented here http://php.net/manual/en/function.preg-match-all.php

try like this.

<?php
$html = file_get_contents("http://www.defimedia.info");
preg_match_all('/<h3>(.*?)<h3>/si', $html, $match);
print_r($match);
?>

edited Nov 23 '16 at 08:19

answered Nov 23 '16 at 08:03

tanaydin

5,171
28
45

score 0 · Answer 2 · edited May 23 '17 at 10:30

Regex is not the correct tool for parsing HTML/XML instead you can use DOMDocument

You can use DOMDocument like as

$html = file_get_contents("http://www.defimedia.info");
$dom = new DOMDocument();

libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_use_internal_errors(false);

$h3s = $dom->getElementsByTagName('h3');
foreach ($h3s as $h3) {
    echo $h3->nodeValue."<br>";
}

Why did I used libxml_use_internal_errors(true); ?

PHP - preg_match unable to get all elements from html url

`

2 Answers2