0

I have a script that runs multiple instances of Python Scrapy crawlers, Crawlers are int /root/crawler/batchscript.py

and in /root/crawler/ I have that scrapy crawler.

Crawlers are working perfectly fine.

batchscript.py looks like this, (posting only relevent code)

from scrapy.settings import Settings
from scrapy.utils.project import get_project_settings
from amazon_crawler.spiders.amazon_scraper import MySpider

process = CrawlerProcess(get_project_settings())

When I run batchscrip.py inside /root/crawler/ directory scraper runs fine.

But when I run it from outside of this directory using python /root/crawler/batchscript.py then it does not run as intended, (Settings are not imported correctly), get_project_settings() are empty.

I have tried creating a BASH script too I create bash script called batchinit.sh

#!/bin/bash
alias batchscript="cd /root/crawler/"
python batchscript.py

and behaviour is same :(

When I run batchinit.sh inside /root/crawler/ directory scraper runs fine.

But when I run it from outside of this directory using bash /root/crawler/batchinit.sh then it does not run as intended, (Settings are not imported correctly), get_project_settings() are empty.

Why I am doing it? What is ultimate goal?

I want to create a cronjob for this script. I tried to schedule cronjobs using above mentioned commands but I have issues as mentioned above.

Umair Ayub
  • 19,358
  • 14
  • 72
  • 146
  • 1
    What are you trying to do by defining the alias in the shell script? Why not just put cd /root/crawler/ on that line instead of aliasing it to batchscript? – Christopher Shroba Nov 17 '16 at 18:42
  • Where are `scrapy` and `amazon_crawler` modules? Are them in a virtual env? – lucasnadalutti Nov 17 '16 at 18:42
  • This may help: http://stackoverflow.com/a/22466264/2874789 – Christopher Shroba Nov 17 '16 at 18:44
  • @ChristopherShroba I am newbie to shell scripting ... id wrote simple cd command but it didnt work ... – Umair Ayub Nov 17 '16 at 18:46
  • See: [How do I “cd” in Python?](http://stackoverflow.com/q/431684/3776858) – Cyrus Nov 17 '16 at 19:25
  • just to shed some light on the bash script, the line `alias batchscript="cd /root/crawler"` doesn't actually do a `cd` -- it simply sets up an alias `batchscript` which will cd to that directory if it's run as a bash command – John Nov 17 '16 at 20:45

1 Answers1

3

using bash, you could always do:

cd /root/crawler && python batchscript.py

it's always good policy to use absolute paths to programs/executables referenced in cron jobs.

matias elgart
  • 1,123
  • 12
  • 18