Selecting a specific div from a extern webpage using CURL

Question

Hi can anyone help me how to select a specific div from the content of a webpage.

Let's say i want to get the div with id="wrapper_content" from webpage http://www.test.com/page3.php.

My current code looks something like this: (not working)

//REG EXP.
$s_searchFor = '@^/.dont know what to put here..@ui';    

//CURL
$ch = curl_init();
$timeout = 5; // set to zero for no timeout
curl_setopt ($ch, CURLOPT_URL, 'http://www.test.com/page3.php');
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt ($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
if(!preg_match($s_searchFor, $ch))
{
  $file_contents = curl_exec($ch);
}
curl_close($ch);

// display file
echo $file_contents;

So i'd like to know how i can use reg expressions to find a specific div and how to unset the rest of the webpage so that $file_content only contains the div.

score 17 · Accepted Answer · edited May 23 '17 at 12:34

HTML isn't regular, so you shouldn't use regex. Instead I would recommend a HTML Parser such as Simple HTML DOM or DOM

If you were going to use Simple HTML DOM you would do something like the following:

$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);

Even if you used regex your code still wouldn't work correctly. You need to get the contents of the page before you can use regex.

//wrong
if(!preg_match($s_searchFor, $ch)){
    $file_contents = curl_exec($ch);
}

//right
$file_contents = curl_exec($ch); //get the page contents
preg_match($s_searchFor, $file_contents, $matches); //match the element
$file_contents = $matches[0]; //set the file_contents var to the matched elements

str_get_html() function is not defined. why? – huykon225 Aug 23 '17 at 09:53 — huykon225, Aug 23 '17 at 09:53

score 4 · Answer 2 · answered May 27 '13 at 08:47

4

include('simple_html_dom.php');
$html = str_get_html($file_contents);
$elem = $html->find('div[id=wrapper_content]', 0);

Download simple_html_dom.php

answered May 27 '13 at 08:47

Amit Garg

3,867
1
27
37

score 0 · Answer 3 · answered Apr 01 '10 at 09:43

0

check our hpricot, it lets you elegantly select sections

first you would use curl to get the document, then use hpricot to get the part you need

answered Apr 01 '10 at 09:43

imightbeinatree at Cloudspace

448
2
11

Selecting a specific div from a extern webpage using CURL

3 Answers3

Linked