4

I have a django script that should be run at a specified time every day. I am trying to achieve this using crontab. The script is supposed to dump the database, archive it using gzip and upload it to bitbucket.

The following is the relevant part of my crontab file:

00 4    * * *   root    python /my_django_project_path/manage.py update_locations
47 16   * * *   root    python /my_django_project_path/manage.py database_bu

When I execute python /my_django_project_path/manage.py database_bu it works perfectly fine. However crontab either does not execute it or something happens along the way. Even weirder, the first crontab command (update_locations) is executed perfectly fine.

Reading this question, I have tried the following, without success:

Changing the command to:

47 16   * * *   root    (cd /my_django_project_path/ && python manage.py database_bu)

Changing the command to:

47 16   * * *   root    /usr/bin/python /my_django_project_path/manage.py database_bu

Adding the following to my script (even though the other one works fine without it):

#!/usr/bin/python

from django.core.management import setup_environ
import settings
setup_environ(settings)

Running everything through a script that exports the django project settings:

/my_django_project_path/cron_command_executor.sh:

export DJANGO_SETTINGS_MODULE=my_django_project.settings 
python manage.py ${*}

The following in crontab:

47 16   * * *   root    ./my_django_project_path/cron_command_executor.sh database_bu

Changing the user to both my user and the Apache user (www-data).

I have a newline at the end of my crontab file.

UPDATE:

When doing sudo su, running the command manually no longer works. It gets stuck and doesn't do anything.

The output of tail -f /var/log/syslog is:

Mar 3 18:26:01 my-ip-address cron[726]: (system) RELOAD (/etc/crontab) 
Mar 3 18:26:01 my-ip-address CRON[1184]: (root) CMD (python /my_django_project_path/manage.py database_bu)

UPDATE:

I am using the following .netrc file to prevent git asking for credentials:

machine bitbucket.org
    login myusername
    password mypassword

The actual code for the backup script is:

import subprocess
import sh
import datetime
import gzip
from django.core.management.base import BaseCommand

class Command(BaseCommand):
    def handle(self, *args, **options):
        execute_backup()

FILE_NAME = 'some_file_name.sql'
ARCHIVE_NAME = 'some_archive_name.gz'
REPO_NAME    = 'some_repo_name'
GIT_USER = 'some_git_username' # You'll need to change this in .netrc as well.
MYSQL_USER   = 'some_mysql_user'
MYSQL_PASS   = 'some_mysql_pass'
DATABASE_TO_DUMP = 'SomeDatabase' # You can use --all-databases but be careful with it! It will dump everything!.

def dump_dbs_to_gzip():
    # Dump arguments.
    args = [
        'mysqldump', '-u', MYSQL_USER, '-p%s' % (MYSQL_PASS),
        '--add-drop-database',
        DATABASE_TO_DUMP,
    ]
    # Dump to file.
    dump_file = open(FILE_NAME, 'w')
    mysqldump_process = subprocess.Popen(args, stdout=dump_file)
    retcode = mysqldump_process.wait()
    dump_file.close()
    if retcode > 0:
        print 'Back-up error'
    # Compress.
    sql_file = open(FILE_NAME, 'r')
    gz_file = gzip.open(ARCHIVE_NAME, 'wb')
    gz_file.writelines(sql_file)
    gz_file.close()
    sql_file.close()
    # Delete the original file.
    sh.rm('-f', FILE_NAME)

def clone_repo():
    # Set the repository location.
    repo_origin = 'https://%s@bitbucket.org/%s/%s.git' % (GIT_USER, GIT_USER, REPO_NAME)

    # Clone the repository in the /tmp folder.
    sh.cd('/tmp')
    sh.rm('-rf', REPO_NAME)
    sh.git.clone(repo_origin)
    sh.cd(REPO_NAME)

def commit_and_push():
    # Commit and push.
    sh.git.add('.')
    sh.git.commit(m=datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"))
    sh.git.push('origin', 'master')
    sh.cd('..')
    sh.rm('-rf', REPO_NAME)

def execute_backup():
    clone_repo()
    dump_dbs_to_gzip()
    commit_and_push()

if __name__ == "__main__":
    execute_backup()

UPDATE:

I managed to fix it using Chris Clark's suggestion of calling the script directly rather than through manage.py. However, I am still interested in what is causing this issue so the bounty is still available.

UPDATE [SOLVED]:

Adding the following line to /etc/environment and running it as my user account rather than root fixed it:

PWD=/my_django_project_path/helpers/management/commands

I still wonder why only my user can run it so if anyone has the solution to that, please contribute.

Community
  • 1
  • 1
Vlad Schnakovszki
  • 8,434
  • 6
  • 80
  • 114
  • can you do `tail -f /var/log/syslog` to see if there are CRON errors? – jperelli Mar 03 '14 at 14:02
  • What happens if you do `su` for whatever user will execute the command and try to run it? – Stefano Sanfilippo Mar 03 '14 at 15:25
  • @jperelli, this is related to crontab: Mar 3 18:26:01 my-ip-address cron[726]: (*system*) RELOAD (/etc/crontab) Mar 3 18:26:01 my-ip-address CRON[1184]: (root) CMD (python /my_django_project_path/manage.py database_bu) – Vlad Schnakovszki Mar 03 '14 at 15:28
  • @StefanoSanfilippo apparently it does nothing (gets stuck)... – Vlad Schnakovszki Mar 03 '14 at 15:33
  • Then it is not an issue with cron. Maybe wrong permissions? Not only should you check the script itself, but also all the external files and resources that are needed. For a starter, try running with `su whatever -c "python -v ...."` to see verbose Python info about modules and files loaded. – Stefano Sanfilippo Mar 03 '14 at 16:16
  • Doing this produces the following output after which it gets stuck: http://hastebin.com/kijutevexe.avrasm – Vlad Schnakovszki Mar 03 '14 at 16:22
  • When doing Ctrl-C, the program outputs: http://hastebin.com/jotojafujo.vala – Vlad Schnakovszki Mar 03 '14 at 16:23
  • Did you try using absolute paths like /Users/dev/projects/blog/bin/python /Users/dev/projects/blog/blog/blog/manage.py runserver – Anup Mar 05 '14 at 18:18
  • Yes, that didn't work either. – Vlad Schnakovszki Mar 05 '14 at 18:27
  • Your `KeyboardInterrupt` traceback seems to indicate that you call an external command via the `sh` package, and that command hangs. Try to work out which command that is. Check its output. Try attaching [`strace`](http://linux.die.net/man/1/strace) to that command’s process and figuring out what it’s stuck on. – Vasiliy Faronov Mar 05 '14 at 19:09
  • @VasiliyFaronov, here's the output of strace. I don't see anything helpful in there... I tried to redirect the output to a file but the file remains empty as I have to use Ctrl-C to end it. This is what I could copy from the terminal: http://hastebin.com/xohevewuye.vhdl – Vlad Schnakovszki Mar 06 '14 at 17:35
  • @VladSchnakovszki Your hastebin links are outdated. Check my answer, let me know if that helps. – Anshul Goyal Mar 11 '14 at 01:06

4 Answers4

4

Since some version of python /my_django_project_path/manage.py database_bu works for you, it means the problem is with your cron environment, or in the way you have set up your cron and not with the script itself (as in the size of file to be uploaded or network connectivity is not causing the issue).

Firstly, you are running the script as

47 16 * * * root python /my_django_project_path/manage.py database_bu

You are providing it a username root, which is not the same user as your current user, while the shell command worked for your current user. The fact that the same command doesn't run from root user using sudo su suggests that your root user account is not properly configured anyway. FWIW, scheduling something as root should almost always be avoided because it can lead to weird file permission issues.

So try scheduling your cron job as follows from that current user.

47 16 * * * cd /my_django_project_path/ && python manage.py database_bu

This may still not run the cron job completely. In which case, the problem could be at 2 places - your shell environment is having some variables that are missing from your cron environment, or your .netrc file is not being read properly for credentials, or both.

In my experience, I have found that PATH variable causes the most troubles, so run echo $PATH on your shell, and if the path value you get is /some/path:/some/other/path:/more/path/values, run your cron job like

47 16 * * * export PATH="/some/path:/some/other/path:/more/path/values" && cd /my_django_project_path/ && python manage.py database_bu

If this doesn't work out, check all the environment variables next.

Use printenv > ~/environment.txt from a normal shell to get all the environment variables set in the shell. Then use the following cron entry * * * * * printenv > ~/cron_environment.txt to identify the missing variables from the cron environment. Alternatively, you can use the snippet in a script to get the value of environment from with the script

import os
os.system("printenv")

Compare the two, figure out any other relevant variables which are different (like HOME), and try using the same within the script/cron entry to check if they work or not.

If things still don't work out, then I think the remaining problem should be with your bitbucket credentials in .netrc in which saving the username and password. The contents .netrc might not be available in the cron environment.

Instead, create and set up an ssh keypair for your account and let the backup happen over ssh instead of https (Its better if you generate a ssh key without passphrase in this step, to avoid ssh-keys' gotchas).

Once you have setup the ssh keys, you will accordingly have to edit the existing origin url from .git/config file of your project root (or will have to add a new remote origin_ssh using git remote add origin_ssh url for the ssh protocol).

Note that https urls for the repo is like https://user@bitbucket.org/user/repo.git while the ssh one is like git@bitbucket.org:user/repo.git.

PS: bitbucket, or rather git is not the ideal solution for backups, there are tonnes of threads hanging around for better backup strategies. Also, while debugging, run your crons every minute (* * * * *), or at similarly low frequency for faster debugging.

Edit

OP says in the comment that setting the PWD variable worked for him.

PWD=/my_django_project_path/helpers/management/commands to /etc/environment

This is what I had suggested earlier, one of the environment variable available in the shell not being present in cron environment.

In general, crown always runs with a reduced set of environment variable and permission, and setting the right variables will make cron work.

Also since you are using a .netrc file for permissions, it is specific to your account, and therefore that won't work with any other account (including the sudo account for root), unless you configure the same setting in your other account as well.

Anshul Goyal
  • 73,278
  • 37
  • 149
  • 186
  • Thanks for the reply. Trying printenv both directly and from a python script using crontab did not work at all. No file would be output, no matter what account I ran it under. It would work if called from the shell though. Maybe this would be the problem? I managed to get the script to work using one of the above answers but I am still curious as to what is causing this so the 50 rep is still available. – Vlad Schnakovszki Mar 11 '14 at 13:01
  • How did you run the job? I scheduled one like `* * * * * printenv >> /home/mu/test_printenv.text` and it worked for me. The output is in the file. – Anshul Goyal Mar 11 '14 at 13:16
  • Ahhhh I was using > instead of >>. It worked with >>. I'll check the output and let you know. – Vlad Schnakovszki Mar 11 '14 at 13:21
  • As for scheduling the cron out of django, as suggested in another answer, I actually thought a backup script could be written in shell script and mistakenly assumed that you were using manage.py to be able to extend the task later in codebase itself (like on demand backup and stuff). – Anshul Goyal Mar 11 '14 at 13:21
  • So here's the output from running printenv from the shell: http://pastebin.com/tGEJk1XL and here's the output from running `printenv` from `crontab` http://pastebin.com/PD9Zxrp0 . – Vlad Schnakovszki Mar 11 '14 at 14:06
  • Turns out adding `PWD=/my_django_project_path/helpers/management/commands` to `/etc/environment` fixed it! You lead me in the right direction so the bounty is yours if you edit your answer to explain why this is the case. Thank you! – Vlad Schnakovszki Mar 11 '14 at 14:07
  • Cool! Have edited my answer, check that for further explanation. Let me know if you have further queries. – Anshul Goyal Mar 11 '14 at 19:10
2

That reminds me of a very frustrating gotcha. Do you have a newline at the end of your crontab file? From man crontab:

...cron requires that each entry in a crontab end in a newline character. If the last entry in a crontab is missing the newline, cron will consider the crontab (at least partially) broken and refuse to install it.

user3387819
  • 136
  • 2
2

This is also a shot in the dark - our team has had issues running management commands through cron. We never bothered to track down why they were flaky, but after much hair-pulling we reverted to invoking the python functions directly rather than going through manage.py and things have been humming along fine ever since.

Chris Clark
  • 4,544
  • 2
  • 22
  • 22
  • +1 Thanks, this fixed the problem. However, I am interested in what is causing this behaviour so I will award that reputation to whoever finds out why it's happening. – Vlad Schnakovszki Mar 11 '14 at 13:02
0

I’m not very good at reading strace output, but I think the one you posted indicates that your program has invoked git and is awaiting its termination. You mention uploading to BitBucket, so here’s a shot in the dark: git tries to push to an ssh remote; when you run it as yourself, ssh-agent authenticates you transparently; but when you run it as root, there’s no ssh-agent, thus git prompts for ssh password and awaits your input.

Try doing the git invocation manually under sudo su and check.

If this does not help, you need to get at the output of git (or whatever it is you’re actually invoking there). Check the documentation for the sh package for details on how to redirect the standard output and standard error.

Vasiliy Faronov
  • 11,840
  • 2
  • 38
  • 49
  • I am using a .netrc file containing the credentials so git should not ask for credentials. Please check the updated question. – Vlad Schnakovszki Mar 09 '14 at 13:46
  • @VladSchnakovszki Sorry, but it’s hard to help you when you’re so reluctant to debug and post diagnostics. It’s good to know you have a `.netrc`. But it’s clear you’re starting a process there and it doesn’t work, and it is amenable to inspection. As I said, can you try doing the `git` invocation manually under `sudo su`? Can you add output logging to your `sh` calls? – Vasiliy Faronov Mar 10 '14 at 16:49