Parse HTML page using cURL

Question

I want to parse an external html page using cURL. This is my simple code :

$ch=curl_init($url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_HEADER,0);
$data=curl_exec($ch);
curl_close($ch);

But i don't know how to access and echo my wish tag (for example a div with class="news") Note:I don't want to usesimple_html_dom.it's slower than cURL and causes some errors for me.

[Dont try to parse html](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) — Tobias Golbs, Jan 21 '14 at 13:58
cURL has nothing to do with parsing, it is just a library for fetching data over the network. The statement that simple_html_dom is slower than cURL therefore makes no sense, and this question is too open-ended to be answerable (it boils down to "how should I find something in HTML?") — IMSoP, Jan 21 '14 at 14:13
@AliN11 You need to be more specific about what you're asking. There are lots of ways to parse HTML, but asking for all of them is too broad a question for this site. You mention you had problems with `simple_html_dom`, but don't say what they are; perhaps you could ask a question to help solve those problems? Either way, cURL is definitely not relevant, as it just gives you a string of HTML, just like if you'd loaded it from a local file. — IMSoP, Jan 21 '14 at 15:08

Hüseyin BABAL · Answer 1 · 2014-01-21T14:23:19.063

1

simple_html_dom is not slow. You can do your work like below;

<?php
include_once('simple_html_dom.php');
$url =''; // Put your crawl url here
$news = array();
$html = file_get_html($url);
foreach ($html->find('div') as $div){
    if ($div->getAttribute('class') == "news")
    array_push($news, $div->getAttribute('class'));
}

echo implode("\n ", $news);

edited Jan 21 '14 at 14:23

answered Jan 21 '14 at 14:11

Hüseyin BABAL

15,400
4
51
73

You need to provide url to $url, also be sure that you have sşmple_html_dom.php – Hüseyin BABAL Jan 21 '14 at 14:22
failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error – AliN11 Jan 21 '14 at 14:24
Please provide valid url to $url, see my updated answer – Hüseyin BABAL Jan 21 '14 at 14:25
If you are familiar, you need to know that "failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error" means your url is unreachable or invalid – Hüseyin BABAL Jan 21 '14 at 14:28
But you said that page is html, this page is xml.Please update your question – Hüseyin BABAL Jan 21 '14 at 14:32
There is no "div" with class "news". – Hüseyin BABAL Jan 21 '14 at 14:36
Try `div.desc` .No difference `news` or `desc`.I can't open this url using this method. – AliN11 Jan 21 '14 at 14:39
have you tried curl -L http://favanews.com/index.aspx/n/10624 in order to check connectivity? You need to be able to connect that url first – Hüseyin BABAL Jan 21 '14 at 14:40
Thank you bro.I think this problem of url.But i want to know: cURL is faster or this method ? – AliN11 Jan 21 '14 at 14:48
Thank you and sorry for bad english today (: – AliN11 Jan 21 '14 at 14:54
curl is different. Curl is only for getting response from endpoint url. In order to parse it, you need to use regular expressions on php. curl doesn't parse. If it worked for you, you can set my answer helpful. – Hüseyin BABAL Jan 21 '14 at 15:01
By using `file_get_html($url)` you are combining two separate things: fetching the data from the URL, and parsing the result of that fetch. If you use cURL or some other method to fetch the data, you can use `str_get_html` instead. This has nothing to do with being "slower", though. – IMSoP Jan 21 '14 at 15:14

Parse HTML page using cURL

1 Answers1