0

so I have the following Scrapy Spider in spiders.py

import scrapy 

class TwitchSpider(scrapy.Spider):
  name = "clips"

  def start_requests(self):
      urls = [
          f'https://www.twitch.tv/wilbursoot/clips?filter=clips&range=7d'
      ]

  def parse(self, response): 
    for clip in response.css('.tw-tower'):
      yield {
        'title': clip.css('::text').get()
      }

But the key aspect is that I want to call this spider as a function, in another file, instead of using scrapy crawl quotes in the console. Where can I read more on this, or whether this is possible at all? I checked through the Scrapy documentation, but I didn't find much

neiii
  • 140
  • 1
  • 7
  • This question was answered before numerous times (eg: [here](https://stackoverflow.com/a/31374345), [here](https://stackoverflow.com/a/34987612) and [here](https://stackoverflow.com/a/62248517)) and there's an example in the documentation. I answered your question, but next time do a little more research. – SuperUser Jan 27 '22 at 15:43

2 Answers2

1

Run the spider from main.py:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

if __name__ == "__main__":
    spider = 'TwitchSpider'
    settings = get_project_settings()
    # change/update settings:
    settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
    process = CrawlerProcess(settings)
    process.crawl(spider)
    process.start()

Run scrapy from a script.

SuperUser
  • 4,527
  • 1
  • 5
  • 24
  • so like I dont need to import the spider itself in any way to main.py, just as long as I reference the name of it, scrapy will parse all the files for a spider? – neiii Feb 11 '22 at 12:17
  • As long you create a project then no, you don't need to import it. You can even run multiple spiders from main.py – SuperUser Feb 12 '22 at 08:58
0

Put your other file in the same directory as your spider file. Then import the spider file like

import spider

Then you will have access to the spider file and can make a spider object.

spi = spider()

Then can call functions on that object such as

spi.parse()

This article shows you how to import other python files classes and functions https://csatlas.com/python-import-file-module/

Sam
  • 17
  • 3