0

On the website http://www.apkmirror.com/apk/opera-software-asa/opera-mini/opera-mini-28-0-2254-119213-release/opera-mini-fast-web-browser-28-0-2254-119213-2-android-apk-download/, I'm trying to extract several fields from the same XPath selector using Item Loaders. To avoid code repetition, I'd like to use the nested_xpath method.

To this end, I would like a relative XPath selector that is essentially a 'no-op' and gives you back the input selection. I thought should be .//*, but this does not seem to work.

If I start the Scrapy shell with

scrapy shell http://www.apkmirror.com/apk/opera-software-asa/opera-mini/opera-mini-28-0-2254-119213-release/opera-mini-fast-web-browser-28-0-2254-119213-2-android-apk-download/ -s USER_AGENT=Mozilla

Then the following XPath expression gives me the desired result:

In [2]: response.xpath('//*[@title="APK details"]/following-sibling::*//text()')
   ...: .extract()
Out[2]: 
['Version: 28.0.2254.119213 (281119213)',
 'arm ',
 'Package: com.opera.mini.native',
 '\n',
 '183 downloads ']

However, if I try to concatenate this with .xpath('.//*') the result becomes an empty list:

In [3]: response.xpath('//*[@title="APK details"]/following-sibling::*//text()')
   ...: .xpath('.//*').extract()
Out[3]: []

What would be the correct 'no-op' XPath selector in this case?

Kurt Peek
  • 52,165
  • 91
  • 301
  • 526
  • 1
    I don't understand what you mean by _"no-op XPath selector"_. Can you share some code using item loaders and nested_xpath with some sample HTML and expected output? – paul trmbrth Jul 18 '17 at 15:00
  • 1
    @KurtPeek Hmm indeed. Maybe you could move `text()` to the chained xpath like so `response.xpath('//*[@title="APK details"]/following-sibling::*').xpath('.//text()').extract()` – Psidom Jul 18 '17 at 15:00
  • 1
    Note: with lxml (and Scrapy by extension), text nodes cannot be applied further XPath expressions. This is a limitation (or bug). So `response.xpath('....//text()').xpath('./some/xpath')` will always give an empty result. – paul trmbrth Jul 18 '17 at 15:06

1 Answers1

0

Following the comments by Psidom and paul trmbrth, I finally moved text() to the chained XPath. So there is still some code repetition of text(), but less than the whole XPath expression.

Kurt Peek
  • 52,165
  • 91
  • 301
  • 526