I am trying to implement a spider in scrapy and I am getting an error when I run the spider and tried several things but couldn't resolved.The error is as follows,
runspider: error: Unable to load 'articleSpider.py': No module named 'wikiSpider.wikiSpider'
I still learning python as well as scrapy package . But I think this is to do with module import from a different directory , so I have include my directory tree in my virtual environment created in pycharm as below image.
Also note that it is python 3.9 I am using as my interpreter for my virtual environment.
Code I am using for this with spider is as follows,
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from wikiSpider.wikiSpider.items import Article
class ArticleSpider(CrawlSpider):
name = 'articleItems'
allowed_domains = ['wikipedia.org']
start_urls = ['https://en.wikipedia.org/wiki/Benevolent'
'_dictator_for_life']
rules = [Rule(LinkExtractor(allow='(/wiki/)((?!:).)*$'),
callback='parse_items', follow=True)]
def parse_items(self, response):
article = Article()
article['url'] = response.url
article['title'] = response.css('h1::text').extract_first()
article['text'] = response.xpath('//div[@id='
'"mw-content-text"]//text()').extract()
lastUpdated = response.css('li#footer-info-lastmod::text').extract_first()
article['lastUpdated'] = lastUpdated.replace('This page was last edited on ', '')
return article
and this is the code in file generating the error ,
import scrapy
class Article(scrapy.Item):
url = scrapy.Field()
title = scrapy.Field()
text = scrapy.Field()
lastUpdated = scrapy.Field()