0

I'm inspecting a page by Chrome Dev Tools and have xpath of an element on the page. I disable javascript deliberately so DOM doesn't get changed. However, xpath I Chrome gives for the element results in [] in scrapy, although the element, of course, exists. What might be the problem?

In particular, xpath //*[@id="prddeatailed_container"]/table[1]/tbody/tr[1]/td/div/table/tbody/tr[2]/td[1]/span for this http://cheaptool.ru/product/sadovyj-pylesos-billy-goat-lb351/ - the price 29 990.

$ scrapy shell 'http://cheaptool.ru/product/sadovyj-pylesos-billy-goat-lb351'

In [2]: xp1 = '//*[@id="prddeatailed_container"]/table[1]/tbody/tr[1]/td/div/table/tbody/tr[2]/td[1]/span'

In [3]: aaa = response.xpath(xp1)

In [4]: aaa
Out[4]: []

UPDATE: It turned out in the result html there was no tbody. Why did Chrome showed it in xpath? How to make it the real html in xpath?

Mario Honse
  • 289
  • 1
  • 3
  • 10
  • why don't you just use: `//span[@class="totalPrice"]` ? – user3616725 Mar 26 '15 at 12:43
  • @user3616725, the question is not what to use, but why doesn't it work. – Mario Honse Mar 26 '15 at 12:59
  • 2
    maybe read the [Scrapy manual](http://doc.scrapy.org/en/0.24/topics/firefox.html) ? speciffically: **Never use full XPath paths, use relative and clever ones based on attributes or any identifying features...** and **Never include `` elements in your XPath expressions unless you really know what you’re doing** – user3616725 Mar 26 '15 at 15:14

2 Answers2

2

"I disable javascript deliberately so DOM doesn't get changed"

Besides javascript, DOM can also get changed because browsers usually has algorithms to fix the html source so that it can be rendered reasonably well by the browser.

"@user3616725, the question is not what to use, but why doesn't it work"

Common case is as what you discovered while I'm writing this answer, Chrome added <tbody> tag automatically. See the following discussion for explanation about this behavior :

Why does my XPath query (scraping HTML tables) only work in Firebug, but not the application I'm developing?

"It turned out in the result html there was no tbody. Why did Chrome showed it in xpath? How to make it the real html in xpath?"

The html result as rendered by Chrome indeed has <tbody>, that's why Chrome showed it in xpath. Chrome dev tools works against final DOM which may be different from the actual HTML source, so you simply can't rely on xpath from Chrome for use in Scrapy.

Community
  • 1
  • 1
har07
  • 88,338
  • 12
  • 84
  • 137
0

Since you mention tbody, a lot of HTML don't follow the rule of using tbody and usually Chrome fix it by adding tbody automatically to it. If you print the response HTML, you won't find any tbody.

Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108