0

I have nearly 20 arguments to take from the command line for a Scrapy. My project has nearly 10 web pages to parse to get data. So, have made multiple spiders and Running multiple spiders in the same process using script example from the Scrapy doc. like

if __name__ == '__main__':
    configure_logging(install_root_handler=False)
    logging.basicConfig(
        filename='scraper.log',
        format='%(created) - f%(levelname)s: %(message)s',
        level=logging.INFO
    )
    settings = args.get_args()
    process = CrawlerProcess(settings)
    process.crawl('Connectors')
    process.crawl("helper")
    ....

    process.start()

I am putting arguments in Scrapy Setting after taking args from cli using argparse library. And passed the same setting object to CrawlerProcess

settings = args.get_args()
process = CrawlerProcess(settings)

I am unable to access these arguments in my spiders. If i try to get this using

from scrapy.utils.project import get_project_settings
self.settings = get_project_settings()

I am getting a new object with all null values. I have tried using init method to get args but that is not working. Please redirect me to some similar project on GitHub or suggest some posts to get through this problem i have already spent so much time and have much work to do.

1 Answers1

0

By that settings = args.get_args() you are passing argparse's Namespace instance to CrawlerProcess while should pass some dict-like object instead.

mizhgun
  • 1,758
  • 15
  • 14