1

I'm making a crawler with Scrapy and wondering why my xpath doesn't work when my CSS selector does? I want to get the number of commits from this html:

<li class="commits">
    <a data-pjax="" href="/samthomson/flot/commits/master">
        <span class="octicon octicon-history"></span>
        <span class="num text-emphasized">
          521
        </span>
        commits
    </a>
  </li

Xpath:

response.xpath('//li[@class="commits"]//a//span[@class="text-emphasized"]//text()').extract()

CSS:

response.css('li.commits a span.text-emphasized').css('::text').extract()

CSS returns the number (unescaped), but XPath returns nothing. Am I using the // for nested elements correctly?

James A Mohler
  • 11,060
  • 15
  • 46
  • 72
S..
  • 5,511
  • 2
  • 36
  • 43

1 Answers1

1

You're not matching all values in the class attribute of the span tag, so use the contains function to check if only text-emphasized is present:

response.xpath('//li[@class="commits"]//a//span[contains(@class, "text-emphasized")]//text()')[0].strip()

Otherwise also include num:

response.xpath('//li[@class="commits"]//a//span[@class="num text-emphasized"]//text()')[0].strip()

Also, I use [0] to retrieve the first element returned by XPath and strip() to remove all whitespace, resulting in just the number.

Sicco
  • 6,167
  • 5
  • 45
  • 61
  • Thanks. I thought by specifying the text-emphasized class I was narrowing it down from all spans. Do you know why [@class="text-emphasized"] didn't work? For example is the [@class="commits"] pointless ? – S.. Sep 19 '15 at 12:39
  • 1
    In XPath, adding more specific details narrows down your search. So you don't need to specify any details if a generic search is enough. E.g., these queries also work: `response.xpath('/li/a/span/text()')[0].strip()` or `response.xpath('//span/text()')[0].strip()`. Note that `//` searches through all descendants, while `/` only searches in the direct descendants. – Sicco Sep 19 '15 at 12:56