1

Within my app i have a function which I want to run every hour to collect data and populate a database (I have an RDS database linked to my Elastic Beankstalk app). This is the function I want to want (a static method defined in my Data model):

@staticmethod
def get_data():
    page = requests.get(....)
    soup = BeautifulSoup(page, 'lxml')
    .....
    site_data = Data.objects.create(...)
    site_data.save()

>>> Data.get_data()
# populates database on my local machine

From reading it seems I want to use either Celery or a cron job. I am unfamiliar with either of these and it seems quite complicated using them with AWS. This post here seems most relevant but I am unsure how I would apply the suggestion to my example. Would I need to create a management command as mentioned and what would this look like with my example?

As this is new to me it would help a lot it someone could point me down the right path.

pjdavis
  • 325
  • 4
  • 25

1 Answers1

1

How to create a management command is covered very detailed in the docs. The following provides a management command called foobar.

project_root/app_name/management/commands/foobar.py

from django.core.management.base import BaseCommand, CommandError
from yourapp.models import Data

class Command(BaseCommand):
    help = 'Dump data'

    def handle(self, *args, **options):
        Data.get_data()

Please read the linked docs - e.g. there are a few __init__.py files that need to be present for django to discover the command properly.

When your project is installed on your EBS it should be connected to the proper database and the data gets stored there.

To configure the cron, follow the instructions from your linked question. There is also AWS Elastic Beanstalk, running a cronjob that covers the topic more detailed.

The line in crontab file should look like that.

0 * * * * /path/to/your/environment/bin/python /path/to/your/project_root/manage.py name_of_your_management_command > /path/to/your/cron.log 2>&1

As I've never used EBS so far the paths are not correct, but with explanations which path it should be. A few details regarding the cron line.

  • 0 * * * * run the command if minute is 0 each hour * at each day * of the month in each month * and every day of th week *
  • The next part is the command that should run
    • /path/to/your/environment/bin/python use the python from your projects environment
    • /path/to/your/project_root/manage.py to invoke your projects manage.py
    • foobar which should run your management command
    • > /path/to/your/cron.log 2>&1 Whole the output from this script STDIN and STDERR should be written into the file /path/to/your/cron.log
dahrens
  • 3,879
  • 1
  • 20
  • 38