I am using scrapy 1.5.2 with python 3.
I have a very simple spider and I created a small pipeline to transform the date field of my item.
Here is my tree folder of my project "entreprises" : http://prntscr.com/o8axfc
As you can see in this screenshot, I created a folder "pipelines" where I added the tidyup.py
file where I added this code:
from datetime import datetime
class TidyUp(object):
def process_item(self, item, spider):
item['startup_date_creation']= map(datetime.isoformat, item['startup_date_creation'])
return item
You can see also in my screenshot I added in settings.py
of my project the parameters :
ITEM_PIPELINES = {'entreprises.pipelines.tidyup.TidyUp': 100}
Here is the code of my spider usine-digitale2.py:
# -*- coding: utf-8 -*-
import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from scrapy.utils.response import open_in_browser
def parse_details(self,response):
if "item_name" not in response.body:
open_in_browser(response)
item=response.mega.get('item',None)
if item:
return item
else:
self.logger.warning("pas d'item reçu pour %s", response.url)
class UsineDigital2Spider(CrawlSpider):
name = 'usine-digital2'
allowed_domains = ['website.fr']
start_urls = ['https://www.website.fr/annuaire-start-up/']
rules = (
Rule(LinkExtractor(restrict_xpaths="//*[@rel='next']")),
Rule(LinkExtractor(restrict_xpaths="//*[@itemprop='url']"),
callback='parse_item')
)
def parse_item(self, response):
i = {}
i["startup_name"] = response.xpath("//h1/text()").extract()
i["startup_date_creation"] = response.xpath("//*[@itemprop='foundingDate']/@content").extract()
i["startup_website"] = response.xpath ("//*[@id='infoPratiq']//a/@href").extract()
i["startup_email"] = response.xpath ("//*[@itemprop='email']/text()").extract()
i["startup_address"] = response.xpath ("//*[@id='infoPratiq']//p/text()").extract()
i["startup_founders"] = response.xpath ("//*[@itemprop='founders']/p/text()").extract()
i["startup_market"] = response.xpath ("//*[@id='ficheStartUp']/div[1]/article/div[6]/p").extract()
i["startup_description"] = response.xpath ("//*[@itemprop='description']/p/text()").extract()
i["startup_short_description"] = response.xpath ("//*[@itemprop='review']/p").extract()
return i
When I run the command :
scrapy crawl usine-digital2 -s CLOSESPIDER_ITEMCOUNT=30
I get this error message :
ModuleNotFoundError: No module named 'entreprises.pipelines.tidyup'; 'entreprises.pipelines' is not a package
And here is log in my terminal :
I searched everywhere in my code. I don't see any errors. This code is from the book "Learn Scrapy" (from Dimitrios Kouzis-loukas) where I follow instructions. I don't understand why it doesn't work.
You can find all source code of scrapy project "entreprises" here :
https://github.com/FormationGrowthHacking/scrapy/tree/master/entreprises
As I am reading the book "Learn Scrapy", you can easily guess I am a newbie tring to develop his first scraper. I would appreciate a lot the help of some expert.
Kind regards