0

I've written a script in php to get the html content or source code from a webpage but I could not succeed. When I execute my script, it opens the page itself. How can I get the html element or source code?

This is the script:

<?php
include "simple_html_dom.php";
function get_source($url)
{
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    $htmlContent = curl_exec($ch);
    curl_close($ch);
    $dom = new simple_html_dom();
    $dom->load($htmlContent);
    return $dom;
}
$scraped_page = get_source("https://stackoverflow.com/questions/tagged/web-scraping");
echo $scraped_page;
?>

Currently I'm getting like this:

enter image description here My expected output is something like:

enter image description here Btw, echoing $htmlContent also gives me what you can see in image 1.

SIM
  • 21,997
  • 5
  • 37
  • 109
  • The last line of code `echo $scraped_page;` displays the document you've loaded, so you should be able to use this to extract the data instead. – Nigel Ren Sep 18 '18 at 18:59
  • Yes, I know but how can I get the source code then? Thanks for your comment @Nigel Ren. – SIM Sep 18 '18 at 19:00
  • 1
    That is the source code, not sure what you are expecting to get? If you want to display the source - either put `echo '
    ';` before and `echo '
    ';` after the echo. Or view the source in your browser.
    – Nigel Ren Sep 18 '18 at 19:02
  • Read the docs on the library you're using. The reason that you're getting what you're getting is because the object you're echoing has a [`__toString()`](https://sourceforge.net/p/simplehtmldom/code/HEAD/tree/trunk/simple_html_dom.php#l1707) function that just returns the bare source. If you want to do something else you need to *do something else*. – Sammitch Sep 18 '18 at 19:02
  • I never asked why don't I get source code using my above script; rather, I asked how I can get them, meaning which way. The above script is just a placeholder to let you know that I tried myself before making a post. Thanks. – SIM Sep 18 '18 at 19:07
  • Please give and example of the desired output. – Nima Sep 18 '18 at 19:09
  • Possible duplicate of [PHP Parse HTML code](https://stackoverflow.com/questions/3627489/php-parse-html-code) – Nigel Ren Sep 18 '18 at 19:10
  • What we see when we ***inspect element*** or click on ***View page source*** button. – SIM Sep 18 '18 at 19:16
  • This is the most basic thing what other languages provide in the first place. However, this is a wrongly applied `Possible duplicate` flag when the question there is totally different from what I've asked here. Thanks anyway. – SIM Sep 18 '18 at 19:23
  • is `echo $scraped_page` not showing what you expected? What is it showing? What did you expect? if the curl request succeeded, it should be showing you some HTML. If it isn't, you probably need to find out why the request failed, or what else went wrong with your script. "Didn't succeed" as a description of your problem doesn't really give us much to go on. What do you mean by "opens the page itself"? Which page? Opens how, exactly? You're just echoing the result of the curl request, that's all. We would really like to help, but we need you to be more specific about your problem. Thankyou. – ADyson Sep 18 '18 at 19:26
  • It strikes me that if you want the _raw HTML_ returned by the curl request, I would suggest echoing `$htmlContent` instead rather than echoing $dom, which it seems is likely to be an object. – ADyson Sep 18 '18 at 19:30
  • Please check out the edit @ADyson. – SIM Sep 18 '18 at 19:39
  • Ok thanks. I guess because you are echoing it into an existing HTML document, so the browser treats it like any other HTML which forms part of the page - i.e. it parses it and renders it. I didn't know if you were executing this from the command-line, or maybe echoing it into a textbox, or anything else. Now we have some context. If you want to see the raw HTML in this context, you need to HTML-encode it so the browser sees it as text and not HTML to actually be interpreted and rendered. – ADyson Sep 18 '18 at 19:43
  • There are potentially a couple of different ways to do that. See https://www.google.co.uk/search?q=php+display+html+code+on+page&oq=PHP+display+HTML+code&aqs=chrome.1.69i57j0l5.3296j0j7&sourceid=chrome&ie=UTF-8 – ADyson Sep 18 '18 at 19:43
  • Can you please be more clear about the expected output? Like providing an example of desired output in text form, not an image. – Nima Sep 18 '18 at 19:49

0 Answers0