0

I was trying to get data from a webpage using PHP and file_get_contents along with regular expressions, but I can't seem to get the correct data from the page.

Here is my code,

<?php
   $homepage = file_get_contents('http://www.website.com');
   preg_match_all('/<p><b>(.*)<\ /b><br>(.*)<br>(.*)<\ /p>/ms', $homepage, $matches);
   $def = $matches[0];
   echo $def;
   ?>

My regular expressions aren't picking up anything even though there is html code that matches the expressions. As a test I also tried replacing the first preg_match function with the following one.

preg_match_all('/<div>(.*)<\ /div>/ms', $homepage, $matches);

This only picked up 2 of the many div tags on the page. What is wrong with my code and what is the correct way it should be written?

Thanks

1 Answers1

1

Instead of using RegEx you could simply use PHP's Document Object Model.

$homepage = file_get_contents('http://www.website.com');
$DOM = new DOMDocument;
$DOM->loadHTML($homepage);
$items = $DOM->getElementsByTagName('div');
$def = $items->item(0)->nodeValue;

(referenced form this question).

Community
  • 1
  • 1
Godwin
  • 9,739
  • 6
  • 40
  • 58