-2

Possible Duplicate:
How to parse and process HTML with PHP?

I have used this code for fetching html content from given website of url.

**Code:**

=================================================================

example URL: http://www.qatarsale.com/EnMain.aspx

/*

$regexp = '/<div id="UpdatePanel4">(.*?)<\/div>/i';

@preg_match_all($regexp, @file_get_contents('http://www.qatarsale.com/EnMain.aspx'), $matches, PREG_SET_ORDER);*/

/*

but $matches returns blank array. I want fetch all html content that are found in div id="UpdatePanel4".

If anybody have any solution please suggest me.

Thanks

Community
  • 1
  • 1
user1487757
  • 17
  • 2
  • 3

3 Answers3

3

First, make sure the server let you fetch the data.

Second, use a html parser instead to parse the data.

$html = @file_get_contents('http://www.qatarsale.com/EnMain.aspx');
if (!$html) {
  die('can not get the content!');
}
$doc = new DOMDocument();
$doc->loadHTML($html);
$content = $doc->getElementById('UpdatePanel4');
xdazz
  • 158,678
  • 38
  • 247
  • 274
0
// Gets the webpage
$html = @file_get_contents('http://www.qatarsale.com/EnMain.aspx');

$startingTag = '<div id="UpdatePanel4">';
// Finds the position of the '<div id="UpdatePanel4">
$startPos = strpos($html, $startingTag);
// Get the position of the closing div
$endPos = strpos($html, '</div>', $startPos + strlen($startingTag));
// Get the content between the start and end positions
$contents = substr($html, $startPos + strlen($startingTag), $endPos);

You will have to do a bit more work if that UpdatePanel4 div contains more divs

Metalstorm
  • 2,940
  • 3
  • 26
  • 22
0

That just wont help. Even if you manage to get the Regexp working, there are two issues with the way you are using it:

  • What if the server changes minor stuffs of HTML like this: <div data-blah="blah" id="UpdatePanel4">? In that case you too have to change your Regexp.

  • Second issue: I think you want the innerHTML of the div, right? In that case, the way you are dealing with, using regexp, is not taking any care about nesting or the tree structure. The string you will get is from what you specify, up to the first </div> that is encountered.

Solution:

It is ALWAYS a bad idea to use Regexps to parse HTML. Use a DOMDocument instead.

UltraInstinct
  • 43,308
  • 12
  • 81
  • 104