0

I'm trying to rewrite this piece of code to use ItemLoader class:

import scrapy

from ..items import Book


class BasicSpider(scrapy.Spider):
    ...
    def parse(self, response):
        item = Book()

        # notice I only grab the first book among many there are on the page             
        item['title'] = response.xpath('//*[@class="link linkWithHash detailsLink"]/@title')[0].extract()
        return item

The above works perfectly well. And now the same with ItemLoader:

from scrapy.loader import ItemLoader

class BasicSpider(scrapy.Spider):
    ...    
    def parse(self, response):
        l = ItemLoader(item=Book(), response=response)

        l.add_xpath('title', '//*[@class="link linkWithHash detailsLink"]/@title'[0])  # this does not work - returns an empty dict
        # l.add_xpath('title', '//*[@class="link linkWithHash detailsLink"]/@title')  # this of course work but returns every book title there is on page, not just the first one which is required
        return l.load_item()

So I only want to grab the first book title, how do I achieve that?

Bartek R.
  • 441
  • 1
  • 8
  • 15

1 Answers1

0

A problem with your code is that Xpath uses one-based indexing. Another problem is that the index bracket should be inside the string you pass to the add_xpath method.

So the correct code would look like this:

l.add_xpath('title', '(//*[@class="link linkWithHash detailsLink"]/@title)[1]')
mihal277
  • 126
  • 1
  • 7
  • Thanks for a try but this won't give me what I need. Read more http://stackoverflow.com/a/3676557/4279824 ps. Your code would return an empty dict. – Bartek R. Oct 02 '16 at 17:16
  • I've tried this and it is returning a list (1-elem). – Bartek R. Oct 03 '16 at 16:16
  • Isn't that what you want to achieve? If you define your object this way: Book = field(output_processor=TakeFirst()), you should get what you expect. – mihal277 Oct 03 '16 at 19:56