0

I have a scrapy crawler on an elastic beanstalk app that I can run by SSH like this:

  • source /opt/python/run/venv/bin/activate
  • source /opt/python/current/env
  • cd /opt/python/current/app
  • scrapy crawl spidername

I want to set up a cronjob to run this for me. So I followed the suggestions here.

My setup.config file looks like this:

container_commands:
  01_cron_hemnet:
    command: "cat .ebextensions/spider_cron.txt > /etc/cron.d/crawl_spidername && chmod 644 /etc/cron.d/crawl_spidername"
  leader_only: true

My spider_cron.txt file looks like this:

# The newline at the end of this file is extremely important.  Cron won't run without it.
* * * * * root sh /opt/python/current/app/runcrawler.sh &>/tmp/mycommand.log
# There is a newline here.

My runcrawler.sh file is located at /opt/python/current/app/runcrawler.sh and looks like this

#!/bin/bash

cd /opt/python/current/app/
PATH=$PATH:/usr/local/bin
export PATH
scrapy crawl spidername

I can navigate to /etc/cron.d/ and see that crawl_spidername exists there. But when I run crontab -l or crontab -u root -l it says that no crontab exists.

I get no log errors, no deployment errors and the /tmp/mycommand.log file that I try to output the cron to is never created. Seems like the cronjob is never started.

Ideas?

Community
  • 1
  • 1
Marcus Lind
  • 10,374
  • 7
  • 58
  • 112
  • You sure that your code is error free? – Chiyaan Suraj Apr 13 '15 at 10:01
  • No errors in log, no deployment errors, and I can run "scrapy crawl spidername" by SSH without errors. It's just that the cronjob does not run, or maybe it does run but the command does not do anything(?). Is it correct to write * * * * * username path command, the way I do it? – Marcus Lind Apr 13 '15 at 11:34

1 Answers1

-1

Your spider_cron.txt has an extra space after /opt/python/current/app/ but before scrapy. So the command being run is just a folder "/opt/python/current/app/"

Yours

40 9 * * * root /opt/python/current/app/ scrapy crawl spidername > /dev/null

Should be

40 9 * * * root /opt/python/current/app/scrapy crawl spidername > /dev/null

Does typing EXACTLY "/opt/python/current/app/scrapy crawl spidername" start your crawler?

greg_diesel
  • 2,955
  • 1
  • 15
  • 24
  • No, Scrapy is not a file in /app/. It's an installed command. Your solution does not work, and is not correct. I just updated my question with some changes I've done, that still is not working. – Marcus Lind Apr 13 '15 at 14:51
  • Now that you have edited the question that looks close to working. If you login and don't change directories. Can you run your script by just typing "/opt/python/current/app/runcrawler.sh" – greg_diesel Apr 13 '15 at 18:18
  • It works if I do `sudo crontab crawl_spidername` and add it to the root crontab. Everything works fine if I do it. But it seems that when I just put the file in `/etc/cron.d/` its not enough to make the cron actually load and run. So this means I have to go into the instance and add the cronjob every time I restart the server or something like that. – Marcus Lind Apr 14 '15 at 03:01
  • Here is another stackoverflow thread that deals with adding items to cron by script. http://stackoverflow.com/questions/4880290/linux-how-do-i-create-a-crontab-thru-a-script In particular I would look at the post by Joe Casadonte http://stackoverflow.com/a/9625233/4179009 – greg_diesel Apr 14 '15 at 13:26