0

I am trying to implement a spider in scrapy and I am getting an error when I run the spider and tried several things but couldn't resolved.The error is as follows,

runspider: error: Unable to load 'articleSpider.py': No module named 'wikiSpider.wikiSpider'

I still learning python as well as scrapy package . But I think this is to do with module import from a different directory , so I have include my directory tree in my virtual environment created in pycharm as below image.

enter image description here

Also note that it is python 3.9 I am using as my interpreter for my virtual environment.

Code I am using for this with spider is as follows,

from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule
from wikiSpider.wikiSpider.items import Article


class ArticleSpider(CrawlSpider):

   name = 'articleItems'
   allowed_domains = ['wikipedia.org']
   start_urls = ['https://en.wikipedia.org/wiki/Benevolent'
              '_dictator_for_life']
   rules = [Rule(LinkExtractor(allow='(/wiki/)((?!:).)*$'),
         callback='parse_items', follow=True)]

   def parse_items(self, response):
      article = Article()
      article['url'] = response.url
      article['title'] = response.css('h1::text').extract_first()
      article['text'] = response.xpath('//div[@id='
                                     '"mw-content-text"]//text()').extract()

      lastUpdated = response.css('li#footer-info-lastmod::text').extract_first()
      article['lastUpdated'] = lastUpdated.replace('This page was last edited on ', '')
      return article

and this is the code in file generating the error ,

import scrapy


class Article(scrapy.Item):
   url = scrapy.Field()
   title = scrapy.Field()
   text = scrapy.Field()
   lastUpdated = scrapy.Field()
Asanka.S
  • 61
  • 7

1 Answers1

0

from "wikiSpider".wikiSpider.items import Article

change this folder name. and then edit: from wikiSpider.items import Article

Solved.

  • 2
    Welcome to StackOverflow. While this code may solve the question, including an explanation of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please edit your answer to add explanations and give an indication of what limitations and assumptions apply. – Federico Baù Jan 13 '21 at 10:09