0

#I am trying to run a script following these requirements:

  1. After running the demo10.py script, The AmazonfeedSpider will crawl the product information using the generated urls saved in Purl and save the output into the dataset2.json file

  2. After successfully crawling and saving data into dataset2.json file , The ProductfeedSpider will run and grab the 5 urls returned by the Final_Product() method of CompareString Class..

  3. Finally after grabing the final product_url list from Comparestring4 Class, The ProductfeedSpider will scrape data from the returned url list and save the result into Fproduct.json file.

#Here is the demo10.py file:

import scrapy
from scrapy.crawler import CrawlerProcess
from AmazonScrap.spiders.Amazonfeed2 import AmazonfeedSpider
from scrapy.utils.project import  get_project_settings
from AmazonScrap.spiders.Productfeed import ProductfeedSpider
import time
# from multiprocessing import Process


# def CrawlAmazon():
    
    
def main():
    process1 = CrawlerProcess(settings=get_project_settings())
    process1.crawl(AmazonfeedSpider)
    process1.start()
    process1.join()
    # time.sleep(20)
    process2 = CrawlerProcess(settings=get_project_settings())
    process2.crawl(ProductfeedSpider)
    process2.start()
    process2.join()


if __name__ == "__main__":
    main()

#After running the file it causes exception in the compiletime and says that dataset.json file doesn't exist. Do I need to use multiprocessing in order to create delay between the spiders? then how can I implement it?

#I am looking forward to hearing from experts

0 Answers0