3

Possible Duplicate:
How to parse and process HTML with PHP?

I know file_get_contents(url) method, but i wanted is that maybe using file_get_contents(url) at first to pull the contents of a page then is there something methods/functions that can extract or get a certain block of contents from the contents that you get using file_get_contents(url)? Here's a sample:

so the code will be like this:

$pageContent = file_get_contents('http://www.pullcontentshere.com/');

and this will be the output of $pageContent

<html> <body>
    <div id="myContent">
        <ul>    
            <li></li>
            <li></li>
            <li></li>
        </ul>
    </div> 
</body> </html>

Maybe you have something to suggest or have in mind how to specifically extract the <div id="myContent"> and the entire children of it?

So it will be something like this:

$content = function_here($pageContent);

so the output would be like this:

        <div id="myContent">
            <ul>    
                <li></li>
                <li></li>
                <li></li>
            </ul>
        </div> 

Answers are greatly appreciated!

Community
  • 1
  • 1
PHP Noob
  • 1,597
  • 3
  • 24
  • 34

3 Answers3

3

Another way would be to use regex.

<?php

$string = '<html> <body> 
    <div id="myContent"> 
        <ul>     
            <li></li> 
            <li></li> 
            <li></li> 
        </ul> 
    </div>  
</body> </html>';

if ( preg_match ( '/<div id="myContent"(.*?)<\/div>/s', $string, $matches ) )
{
    foreach ( $matches as $key => $match )
    {
        echo $key . ' => ' . htmlentities ( $match ) . '<br /><br />';
    }
}
else
{
    echo 'No match';
}

?>

Live example: http://codepad.viper-7.com/WSoWCh

w00
  • 26,172
  • 30
  • 101
  • 147
  • hi, i like your answer, short and brief. But when i tried it, it displays as text not as html output. Do you know how to make it work as html display? – PHP Noob May 26 '12 at 18:02
  • @PHPNoob yes, just remove the htmlentities() function – w00 May 26 '12 at 19:00
3

You can use the built-in SimpleXMLElement as explained in nullpointr's answer, or you can also use regular expressions. Another solution, that I usually find pretty simple is PHP Simple HTML DOM Parser. You can use jQuery-style selectors with this lib. A simple example with your code would look like this:

// Create DOM from url
$html = file_get_html('http://www.pullcontentshere.com');
// Use a selector to reach the content you want
$myContent = $html->find('div.myContent')->plaintext;
fanf
  • 110
  • 1
  • 7
0

You need to use XML parsing to solve your problem. I would recommend SimpleXML to you that is already part of php. Here's an example:

$sitecontent = "
<html>   
   <body>
      <div>
         <ul>    
            <li></li>
            <li></li>
            <li></li>
         </ul>
      </div> 
   </body> 
 </html>";

 $xml = new SimpleXMLElement($sitecontent);
 $xpath = $xml->xpath('//div');

 print_r($xpath);
nullpointr
  • 524
  • 4
  • 18