0

I use scrapy for a project with item pipeline specifically designed for items fields need to be inserted into database. I'm employing a python decorator method for this to work. For some reason I can't get my head wrapped around this issue, I get particular nameError which I am not sure where are they coming from. Note: this method people have confirmed was working perfectly fine.

This is the code I have in my spider.py file:

from scrapy.spider import Spider
from scrapy.http import Request,FormRequest
from exampleScraper.items import exampleItem
import urllib, time, MySQLdb, sys

today = time.strftime("%x %X")

class idoxpaSpider(Spider):
  pipeline = set(['insert'])

  name = 'idoxpaSpider'

  start_urls = ["http://www.example.com"]
  ###
  def parse(self, response):
    # some scrapy work 
    return item

###
class insert(object):
  def __init__(self):
    self.conn = MySQLdb.connect(<some parameters>)
    self.cursor = self.conn.cursor()

  @check_spider_pipeline
  def process_item(self, item, spider):
    return item

and this is what I have in my pipeline file:

import sys

class BoroughscrperPipeline(object):
  def process_item(self, item, spider):
    def check_spider_pipeline(process_item_method):
      @functools.wraps(process_item_method)
      def wrapper(self, item, spider):
      # message template for debugging
        msg = '%%s %s pipeline step' % (self.__class__.__name__,)

        # if class is in the spider pipeline then use use process_item normally
        if self.__class__ in spider.pipeline:
          spider.log(msg % 'executing', level=log.DEBUG)
          return process_item_method(self, item, spider)

      # otherwise return the untouched item
        else:
          spider.log(msg % 'skipping', level=log.DEBUG)
          return item
      return wrapper

and this is the acutal error I am getting:

File "/home/mn/workbench/boroughScrper/boroughScrper/spiders/westminsterSpider.py", line 40, in insert
    @check_spider_pipeline
NameError: name 'check_spider_pipeline' is not defined

Any idea where is this going wrong?

mehdix_
  • 479
  • 1
  • 7
  • 20

1 Answers1

0

The check_spider_pipeline is "not defined" because it can't be found. It is not in the scope visible to your spider.py script, but defined in the locals of BoroughscrperPipeline.process_item. You need to make it visible in your scope. Check this answer for example: How can I use different pipelines for different spiders in a single Scrapy project

Community
  • 1
  • 1
bosnjak
  • 8,424
  • 2
  • 21
  • 47