I have a script that runs multiple instances of Python Scrapy crawlers, Crawlers are int /root/crawler/batchscript.py
and in /root/crawler/
I have that scrapy crawler.
Crawlers are working perfectly fine.
batchscript.py looks like this, (posting only relevent code)
from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings
from amazon_crawler.spiders.amazon_scraper import MySpider
process = CrawlerProcess(get_project_settings())
When I run batchscrip.py
inside /root/crawler/
directory scraper runs fine.
But when I run it from outside of this directory using python /root/crawler/batchscript.py
then it does not run as intended, (Settings are not imported correctly), get_project_settings()
are empty.
I have tried creating a BASH script too
I create bash script called batchinit.sh
#!/bin/bash
alias batchscript="cd /root/crawler/"
python batchscript.py
and behaviour is same :(
When I run
batchinit.sh
inside/root/crawler/
directory scraper runs fine.But when I run it from outside of this directory using
bash /root/crawler/batchinit.sh
then it does not run as intended, (Settings are not imported correctly),get_project_settings()
are empty.
Why I am doing it? What is ultimate goal?
I want to create a cronjob for this script. I tried to schedule cronjobs using above mentioned commands but I have issues as mentioned above.