8

I have spider that I have written using the Scrapy framework. I am having some trouble getting any pipelines to work. I have the following code in my pipelines.py:

class FilePipeline(object):

    def __init__(self):
        self.file = open('items.txt', 'wb')

    def process_item(self, item, spider):
        line = item['title'] + '\n'
        self.file.write(line)
        return item

and my CrawlSpider subclass has this line to activate the pipeline for this class.

ITEM_PIPELINES = [
        'event.pipelines.FilePipeline'
    ]

However when I run it using

scrapy crawl my_spider

I get a line that says

2010-11-03 20:24:06+0000 [scrapy] DEBUG: Enabled item pipelines:

with no pipelines (I presume this is where the logging should output them).

I have tried looking through the documentation but there doesn't seem to be any full examples of a whole project to see if I have missed anything.

Any suggestions on what to try next? or where to look for further documentation?

Jim Jeffries
  • 9,841
  • 15
  • 62
  • 103

2 Answers2

8

Got it! The line needs to go in the settings module for the project. Now it works!

Jim Jeffries
  • 9,841
  • 15
  • 62
  • 103
0

I'm willing to bet that it's a capitalisation difference in the word pipeline somewhere:

Pipeline vs. PipeLine

I notice 'event.pipelines.FilePipeline' uses the former, whereas your code uses the latter: which do your filenames use?

(I have fallen victim to this spelling mistake many times!)

James
  • 24,676
  • 13
  • 84
  • 130