php simple_html_dom scraping issue

Question

I am trying to scrape a content from one site using the simple_html_dom using this code

$html = file_get_html('http://www.aswaqcity.com/thread1230092.html');
//echo $html;
// Find all article blocks
foreach($html->find('/html/body/div[2]/div[1]/div/div/div/table[1]/tbody/tr[2]/td[2]') as $article) {
    $item['title']      = $article->find('/div[1]/strong', 0)->plaintext;
    $articles[] = $item;
}

print_r($articles);

I got the xpath from firebug options but there is nothing scraped.

@Enissay So, are the answers to [this question](http://stackoverflow.com/questions/9378107/how-to-use-xpath-in-php-simple-html-dom-parser) wrong? Not familiar with PHP, just curious. It seems to me XPath expressions can be used: http://simplehtmldom.sourceforge.net/manual.htm#section_find. — Mathias Müller, Jan 01 '15 at 22:07
@MathiasMüller Scratch that, both are supported (my bad)... I tried to explore the code, but it looks like it has some encoding problem when displaying the result and which I couldn't solve... — Enissay, Jan 01 '15 at 22:22
Please explain what you are trying to find on this page. What would be the expected output? @Enissay No worries - I misread specifications all the time myself.. — Mathias Müller, Jan 01 '15 at 22:24

score 1 · Answer 1 · edited Jan 03 '15 at 13:30

1

Most likely the tbody isn't really there. HTML browsers will add those to the dom whenever they are missing.

Also you should be using css instead of xpath, it's the whole point of using simple-html-dom.

edited Jan 03 '15 at 13:30

hakre

193,403
52
435
836

answered Jan 01 '15 at 23:12

pguardiario

53,827
19
119
159

Why is using CSS the point of simple-html-dom? (No criticism intended, I am asking out of curiosity) – Mathias Müller Jan 01 '15 at 23:20
Because you don't need simple-html-dom to get that stuff with xpath. There's built-in Dom functions that can do that. – pguardiario Jan 02 '15 at 02:46
Ah, that makes sense. Thanks! Also, + 1 - `tbody` is the culprit most likely. – Mathias Müller Jan 02 '15 at 09:48

php simple_html_dom scraping issue

1 Answers1