I'm new to python and scrapy, so I apologise for maybe silly questions in advance. I have some troubles with default item loader's processors, and related questions:
I use default_input_processor variable to extract first value from list using TakeFirst() processor like that:
class CaseLoader(scrapy.loader.ItemLoader): default_input_processor = TakeFirst()
and usage:
def load_row_data(self, row): cl = CaseLoader(CaseItem(), row) cl.add_xpath('case_num', './/td[1]/a/text()') cl.add_xpath('case_link', './/td[1]/a/@href') cl.add_xpath('name', './/td[3]/text()') return cl.load_item()
then I yield this item from callback methos, but TakeFirst() doesn't work, I get a list instead of string. If I use TakeFist() as default_output_processor, it works. How does default_input_processor works? Why TakeFisrt() processor isn't applied in this case?
In documentation I saw usage of unicode.strip method:
from scrapy.loader import ItemLoader from scrapy.loader.processors import TakeFirst, MapCompose, Join class ProductLoader(ItemLoader): default_output_processor = TakeFirst() name_in = MapCompose(unicode.title) name_out = Join() rice_in = MapCompose(unicode.strip) # ...
But when I tried to use it in my Item Loader in Compose() I get error:
NameError: name 'unicode' is not defined
If I understand right this method should remove white spaces from beginning and end of the string. How to use it properly? Do I need to code and use my strip function instead?