3

I'm new to python and scrapy, so I apologise for maybe silly questions in advance. I have some troubles with default item loader's processors, and related questions:

  1. I use default_input_processor variable to extract first value from list using TakeFirst() processor like that:

    class    CaseLoader(scrapy.loader.ItemLoader):
        default_input_processor = TakeFirst()
    

    and usage:

      def load_row_data(self, row):
          cl = CaseLoader(CaseItem(), row)
    
          cl.add_xpath('case_num',  './/td[1]/a/text()')
          cl.add_xpath('case_link', './/td[1]/a/@href')
          cl.add_xpath('name',      './/td[3]/text()')
          return cl.load_item()
    

    then I yield this item from callback methos, but TakeFirst() doesn't work, I get a list instead of string. If I use TakeFist() as default_output_processor, it works. How does default_input_processor works? Why TakeFisrt() processor isn't applied in this case?

  2. In documentation I saw usage of unicode.strip method:

    from scrapy.loader import ItemLoader
    from scrapy.loader.processors import TakeFirst, MapCompose, Join
    
    class ProductLoader(ItemLoader):
    
        default_output_processor = TakeFirst()
    
        name_in = MapCompose(unicode.title)
        name_out = Join()
    
        rice_in = MapCompose(unicode.strip)
    
        # ...
    

    But when I tried to use it in my Item Loader in Compose() I get error:

    NameError: name 'unicode' is not defined
    

    If I understand right this method should remove white spaces from beginning and end of the string. How to use it properly? Do I need to code and use my strip function instead?

Hemul
  • 81
  • 2
  • 8

1 Answers1

4

That is because the documentation is using Python2 and you are using Python3

There is no unicode in Python3. You should use str instead

class ProductLoader(ItemLoader):

    default_output_processor = TakeFirst()

    name_in = MapCompose(str.title)
    name_out = Join()

    rice_in = MapCompose(str.strip)

See below thread also for more information

NameError: global name 'unicode' is not defined - in Python 3

Tarun Lalwani
  • 142,312
  • 9
  • 204
  • 265
  • Thank you for answer, but could you provide explanation to my first question too? If needed, I can provide more information about problem. – Hemul Oct 07 '17 at 11:50
  • Because you didn't add a output processor. Add `default_output_processor = TakeFirst()` and then try – Tarun Lalwani Oct 07 '17 at 11:57
  • default_input_processor doesn't work without default_output_processor? I want to use another processor as default_output_processor, can I use TakeFirst() as default_input_processor, if not, why? – Hemul Oct 07 '17 at 12:04
  • 1
    That is the way it is written. `if processed_value: self._values[field_name] += arg_to_iter(processed_value)`. After applying input process it is converted back to array so you can apply output processors. As processors will assume input as an array only. – Tarun Lalwani Oct 07 '17 at 12:40