Parsing any webpage using CURL on PHP

Question

Is it possible to write a PHP function that returns HTML-string of any possible link the same way the browser does? Example of links: "http://google.com", "", "mywebsite.com", "somesite.com/.page/nn/?s=b#85452", "lichess.org"

What I've tried:

$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_SSLVERSION, 3);
curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 20);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$data = curl_exec($curl);
if(curl_errno($curl)){
    echo 'Curl error: ' . curl_error($curl);
}
echo $data;
curl_close($curl);

Sadly enough, for some links this code returns blank page because of SSL or any other stuff, but for some links it works.

Or is there any alternative to CURL? I just do not understand why php cannot retrieve any html out of the box.

If you truly want to do some parsing using php you should check php dom element. — Altaf Hussain, May 08 '17 at 00:44
I tried using file_get_contents and again for some links it works for others does not — rint, May 08 '17 at 00:45
That's really not enough info, why doesn't it work? What errors are you receiving? Debug your code.. — Devon Bessemer, May 08 '17 at 00:46

score 1 · Answer 1 · answered May 08 '17 at 00:47

1

CURL may fail on SSL sites if you're running an older version of PHP. Make sure your OS and PHP version are up-to-date.

You may also opt to use file_get_contents() which works with URLs and is generally a simpler alternative if you just want to make simple GET requests.

$html = file_get_contents('https://www.google.com/');

answered May 08 '17 at 00:47

Alex Howansky

50,515
8
78
98

I tried using this function, but again it does not work always example http://stackoverflow.com/questions/3535799/file-get-contents-failed-to-open-stream They just advice to use CURL instead, and with CURL I cannot open any links – rint May 08 '17 at 00:56
http://twitter.com/statuses/user_timeline.rss?screen_name=google&count=6 – rint May 08 '17 at 00:58
It is from SO question above – rint May 08 '17 at 00:58
There's no such page there, it's a 404. – Alex Howansky May 08 '17 at 00:58
Ok another link: http://my.nu.edu.kz/.AccountPage/StudentCard?uid=201201518 it should redirect to login page. Sometimes file_get_contents returns "failed to open stream: HTTP request failed" – rint May 08 '17 at 01:04
CURL doesn't emulate a browser, it gets one URL. The page you're trying to hit is doing things with cookies and sessions to keep state. If you don't manage those things manually, CURL isn't going to work. Run `curl_getinfo()` to get the details of what happened with a particular request. In this case, it's expecting a cookie that you didn't provide. – Alex Howansky May 08 '17 at 01:13
Also note, using `curl_setopt($curl, CURLOPT_SSLVERSION, 3);` will artificially limit you. If a site offers only TLS, you won't be able to connect to it. – Alex Howansky May 08 '17 at 01:16

Parsing any webpage using CURL on PHP

1 Answers1