0

I am trying to scrape stack overflow's php newest questions on the basis of 45 questions per page.I am using Simple_html_dom for the parsing. I am almost done but i couldn't scrape the values for the no of answers given to a question as they are using two seperate div tags. Below is the code link to check for and i am also attaching a screenshot link of what the executed code gives.

include_once('simple_html_dom.php');
function httpGet($url)
{
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
    $output=curl_exec($ch);
    curl_close($ch);
    return $output;
}
$count=45;
$url ='http://stackoverflow.com/questions/tagged/php?page=1&sort=newest&pagesize='.$count;
$parse = httpGet($url);
$html = str_get_html($parse);

for($i=0;$i<=$count;$i++){

    $qu=$html->find('a[class=question-hyperlink]', $i)->href;
    $que='https://stackoverflow.com'.$qu;
    $question=$html->find('a[class=question-hyperlink]', $i)->plaintext;
    $link='<a href="'.$que.'">'.$question.'</a>';
    $time=$html->find('span[class=relativetime]',$i)->plaintext;
    $views=$html->find('.views',$i)->plaintext;
    $vote=$html->find('span[class=vote-count-post]',$i)->plaintext;
    $stat1=$html->find('div[class=status answered]',$i)->plaintext;
    echo'<h3>'.$link.'</h3>&nbsp&nbspAsked:&nbsp'.$time.'Vote:'.$vote.'View:'.$views.'Answers: '.'<br><br>';
}

Scraped image

In the image you can see Answers: "Here is where i wanna get the number of answers a question got" Looking for solution with simple_html_dom, although regex answers will also work

Thanks

StackB00m
  • 502
  • 1
  • 5
  • 16
  • Regarding regex -> [look here](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags), the answer contains everything you need. – Andrei Aug 08 '16 at 13:44
  • @Andrew ive read various docs about regex but couldnt find any solution in this part. check it out – StackB00m Aug 08 '16 at 13:45
  • 1
    Can you not just target `div[class=status]` – Steve Aug 08 '16 at 13:54
  • Ive tried it @Andrew it gives no results – StackB00m Aug 08 '16 at 13:58
  • 1
    The element has two classes, `status` and either `answered` or `unanswered`. If you target `status` you will get the element no matter what other classes it has – Steve Aug 08 '16 at 14:01
  • hey @Steve Ive tried it before and uploaded the code from **phpstorm** and due to some issues the code dint got updated it dint work. now it works fine thanks :) – StackB00m Aug 08 '16 at 14:24
  • No problem, glad i could help you – Steve Aug 08 '16 at 14:25

0 Answers0