I tried to parse some html page :
<div class="gs_r"><h3 class="gs_rt"><span class="gs_ctc">[BOOK]</span> <a href="http://exampleA.com" onmousedown="return scife_clk(this.href,'','res','1')">titleA</a></h3><div class="gs_ggs gs_fl"><a href="http://exampleApdf.pdf" onmousedown="return scife_clk(this.href,'gga','gga','1')">
<div class="gs_r"><h3 class="gs_rt"><span class="gs_ctc">[BOOK]</span> <a href="http://exampleB.com" onmousedown="return scife_clk(this.href,'','res','1')">titleB</a></h3><div class="gs_ggs gs_fl"><a href="http://exampleB.doc" onmousedown="return scife_clk(this.href,'gga','gga','1')">
From that html page, we can get informations: links of pages (http://exampleA.com,http://exampleB.com), titles (titleA, titleB), links of documents (http://exampleApdf.pdf,http://exampleB.doc) But, I just want to get the informations of documents that have pdf link. so from that example, I just want to get : http://exampleA.com, titleA, http://exampleApdf.pdf. I've trying, but it gives me blank result. How can I them? thank you ! :) here's the code :
<?php
include 'simple_html_dom.php';
$url = 'http://scholar.google.com/scholar?hl=en&q=data+mining&btnG=&as_sdt=1%2C5&as_sdtp=';
$html = file_get_html($url);
foreach($html->find('div[class=gs_ggs gs_fl]')as $pdfLink){
if (preg_match('/\.pdf$/i', $pdfLink)) {
$html2->find('span[class=gs_ctc]');
echo $html2.$pdfLink;
}
}
?>