0

I'm trying to build a hacker news scraper using Symfony 2's Dom Crawler [1]

When I try out the xpath with a chrome plugin [2], it works. But when I try it in my scraper I keep getting The current node list is empty.

Here's my scraper code:

$crawler1 = $client1->request('GET','https://news.ycombinator.com/item?id=8296437');
$hnpost->selftext = $crawler1->filterXPath('/html/body/center/table/tbody/tr[3]/td/table[1]/tbody/tr[4]/td[2]')->text();

[1] http://api.symfony.com/2.0/Symfony/Component/DomCrawler/Crawler.html#method_filter [2] https://chrome.google.com/webstore/detail/xpath-helper/hgimnogjllphhhkhlmebbmlgjoejdpjl?hl=en-US

A F
  • 7,424
  • 8
  • 40
  • 52
  • possible duplicate of [Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?](http://stackoverflow.com/questions/18241029/why-does-my-xpath-query-scraping-html-tables-only-work-in-firebug-but-not-the) – Jens Erat Oct 12 '14 at 17:41

1 Answers1

1

If the problem is what I think it is, I've been battered by this one a couple of times. Chrome implicitly adds any missing <tbody> tags to the DOM, so if you then copy the XPath or CSS path, you may also have copied tags that don't necessarily exist in the source document. Try viewing the page's source and see if the DOM reported by your browser's console corresponds to the original source HTML. If the <tbody> tags are absent, be sure to exclude them in your filterXPath() call.

Rob Pomeroy
  • 305
  • 3
  • 8