22

I want to get a DIV from an external website with pure PHP.

External website: http://www.isitdownrightnow.com/youtube.com.html

Div text I want from isitdownrightnow (statusup div): <div class="statusup">The website is probably down just for you...</div>

I already tried file_get_contents with DOMDocument and str_get_html, but I could not get it to work.

For example this

$page = file_get_contents('http://css-tricks.com/forums/topic/jquery-selector-div-variable/');
    $doc = new DOMDocument();
    $doc->loadHTML($page);
    $divs = $doc->getElementsByTagName('div');
    foreach($divs as $div) {
        // Loop through the DIVs looking for one withan id of "content"
        // Then echo out its contents (pardon the pun)
        if ($div->getAttribute('class') === 'bbp-template-notice') {
             echo $div->nodeValue;
        }
    }

It will just display an error in the console:

Failed to load resource: the server responded with a status of 500 (Internal Server Error)

Kallewallex
  • 523
  • 1
  • 6
  • 24
  • well it has to load...so im guessing it is dynamically generated with JS...which makes this very difficult – markasoftware Dec 07 '13 at 21:04
  • If you tried `file_get_contents` et al, please show your code and explain what didn't work. – Mike Dec 07 '13 at 21:08
  • @Markasoftware why would that be very difficult? https://requestable.pieterhordijk.com/cBg2b – PeeHaa Dec 07 '13 at 21:09
  • 3
    @OP you really need to show us what the specific problem is you are having or you cannot be helped. "I could not get it to work." is not a valid problem description. – PeeHaa Dec 07 '13 at 21:10
  • You could curl the page, save its contents, load the content into a `DOMDocument` object and traverse the tree with `DOMXPath`. – Darragh Enright Dec 07 '13 at 21:11
  • 1
    @PeeHaa that is for a different url. It he did that, it would work, but the exact url in the question wouldn't – markasoftware Dec 07 '13 at 21:15
  • 1
    OP doesn't say he wants to use that URI. He just wants the result. – PeeHaa Dec 07 '13 at 21:15
  • 1
    Thank you guys for answering. Actually I just choose this site as an example, since I myself don't have anything on the web. It could also be any other site, even a simple html file. @PeeHaa I deleted it because I got really messy, mostly if I would echo my result it was just blank. – Kallewallex Dec 07 '13 at 21:18
  • You still need to tell us your problem... Related: http://sscce.org/ – PeeHaa Dec 07 '13 at 21:19
  • Yes, just give me a minute I'll reproduce it and update the post – Kallewallex Dec 07 '13 at 21:27
  • Check the error log to find out why it is throwing a 500 error. – PeeHaa Dec 07 '13 at 21:45
  • The element you are trying to fetch is actually reloaded by an ajax call (http://www.isitdownrightnow.com/check.php?domain=youtube.com) so this is kinda pointless on this url. – worenga Dec 07 '13 at 21:46
  • @mightyuhu what about the second one I added (css-tricks.com) ...it can be any url. I am not working on a project or something like that. Just trying to learn a bit php – Kallewallex Dec 07 '13 at 21:51
  • Works for me (http://phpfiddle.org/main/code/8i4-0vb), check your server configuration. – worenga Dec 07 '13 at 21:58
  • link update http://phpfiddle.org/main/code/278-fki If you get an error 500 while running your script, your display_error configuration should be adjusted, see http://www.php.net/manual/en/errorfunc.configuration.php – worenga Dec 07 '13 at 22:21

4 Answers4

65

This is what I always use:

$url = 'https://somedomain.com/somesite/';
$content = file_get_contents($url);
$first_step = explode( '<div id="thediv">' , $content );
$second_step = explode("</div>" , $first_step[1] );

echo $second_step[0];
The Codesee
  • 3,714
  • 5
  • 38
  • 78
zk_mars
  • 1,339
  • 2
  • 15
  • 36
  • 1
    **It does work for me on some sites.** However on the site that I am trying to get it does not work... any Idea? – Kallewallex Dec 20 '13 at 21:57
  • 1
    I can't tell without the domain. But it is possible that the content you are trying to get is not generated when using this instead of visiting the domain. You can experiment by using a HTTP client/debugger. I am using Paw http. Just try a request and change the header informations. You can then see the output and check if your divs content gets displayed. – zk_mars Dec 20 '13 at 22:04
  • 1
    Finally. Okay. I tried it. It only displays the div if I modify the header. Thanks a lot. – Kallewallex Dec 20 '13 at 22:14
  • There are so many better ways to do this than string manipulation. If they add a new class to that HTML, or make any sort of minor tweak then you're screwed. Try goutte https://github.com/FriendsOfPHP/Goutte – Phil Sturgeon Dec 19 '14 at 16:32
  • 3
    its ok but what about child content if they have multiple div and it has also multiple closign div (Code is correct but only for single div) – Hiren Kubavat Jun 25 '15 at 11:01
18

This may be a little overkill, but you'll get the gist.

<?php 

$doc = new DOMDocument;

// We don't want to bother with white spaces
$doc->preserveWhiteSpace = false;

// Most HTML Developers are chimps and produce invalid markup...
$doc->strictErrorChecking = false;
$doc->recover = true;

$doc->loadHTMLFile('http://www.isitdownrightnow.com/check.php?domain=youtube.com');

$xpath = new DOMXPath($doc);

$query = "//div[@class='statusup']";

$entries = $xpath->query($query);
var_dump($entries->item(0)->textContent);

?>
worenga
  • 5,776
  • 2
  • 28
  • 50
  • This actually works. Awesome. How do I get it without the "string(XX)" and just get the text in a var? – Kallewallex Dec 07 '13 at 21:54
  • 2
    change var_dump to an assignment like `$var = $entries->item(0)->textContent` – worenga Dec 07 '13 at 21:57
  • 1
    Thank you very much. That did it. I played around with it..... but I really have trouble using it on other websites, sometimes it works sometimes it does not. For example I am trying to get a div `

    Yes.

    ` But using `"//h2[@class='success']";` did not work.
    – Kallewallex Dec 07 '13 at 22:19
  • hard to say without any further details about the specific url. – worenga Dec 07 '13 at 22:24
  • 3
    `$var = $xpath->evaluate('string(//div[@class="startup"])');` would return the text content directly as string. – ThW Dec 07 '13 at 23:04
  • It works just fine but I get a lot of warnings when using it: " htmlParseEntityRef: expecting ';'", "ID ... already defined in ...", "htmlParseEntityRef: no name" and "Unexpected end tag" - is there a workaround for this without disabling the error messages? – user2718671 Aug 19 '14 at 08:33
  • see https://stackoverflow.com/questions/1148928/disable-warnings-when-loading-non-well-formed-html-by-domdocument-php – worenga Aug 20 '14 at 12:25
  • @worenga how can fetch all item(0) to item([last]) values here ? – Mr. Bhosale Apr 24 '17 at 11:42
3

I used the xpath method proposed by @mightyuhu and it worked great with his addition of the assignment. Depending on the web page you get the info from and the availability of an 'id' or 'class' which identifies the tag you wish to get, you will have to change the query you use. If the tag has an 'id' assigned to it, you can use this (the sample is for extracting the USD exchange rate):

$query = "//div[@id='USD']";

However, the site developers won't make it so easy for us, so there will be several more 'unnamed' tags to dig into, in my example:

<div id="USD" class="tab">
  <table cellspacing="0" cellpadding="0">
    <tbody>
     <tr>
        <td>Ask Rate</td>
        <td align="right">1.77400</td>
     </tr>
     <tr class="even">
        <td>Bid Rate</td>
        <td align="right">1.70370</td>
     </tr>
     <tr>
        <td>BNB Fixing</td>
        <td align="right">1.735740</td>
     </tr>
   </tbody>
  </table>
</div>

So I had to change the query to get the 'Ask Rate':

$doc->loadHTMLFile('http://www.fibank.bg/en');
$xpath = new DOMXPath($doc);
$query = "//div[@id='USD']/table/tbody/tr/td";

So, I used the query above, but changed the item to 1 instead of 0 to get the second column where the exchange rate is (the first column contains the text 'Ask Rate'):

$entries = $xpath->query($query);
$usdrate = $entries->item(1)->textContent;

Another method is to reference the value directly within the query, which when you don't have names or styles should be done with indexing the tags, which was something I received as knowledge from my Maxthon browser and its "Inspect element' feature combined with the "Copy XPath" right menu option (neat, yeah?):

"//*[@id="USD"]/table/tbody/tr[1]/td[2]"

Notice it also inserts an asterisk (*) after the //, which I have not digged into. In this case you should again get the value with item(0), since there will be no other values.

If you need, you can make any changes to the string you extracted, for example changing the number format to match your preference:

$usdrate = number_format($usdrate, 5, ',', ' ');

I hope someone will find this helpful, as I found the answers above, and will spare this someone time in searching for the correct query and syntax.

-3
$contents = file_get_contents($url); 

  $title = explode('<div class="entry-content">',$contents); 
  $title = explode("</div>",$title[1]); 

$fp = fopen ("s.php", "w+"); 
fwrite ($fp, "$title[0]"); 
fclose ($fp); 
require_once('s.php'); 
  • Why in the world do you use fopen/fwrite/require_once? Also; you are duplicating the accepted answer..? – Sjon Sep 21 '15 at 17:25
  • 2
    Thank you for posting an answer to this question! Code-only answers are discouraged on Stack Overflow, because it can be difficult for the original poster (or future readers) to understand the logic behind them. Please, edit your question and include an explanation of your code so that others can benefit from your answer. Thanks! – Maximillian Laumeister Sep 22 '15 at 01:37