6

I have a function that needs to run in the background on one of my web applications.

I implemented a custom AppConfig as shown below:

class MyAppConfig(AppConfig):
    run_already = False

    def ready(self):
        from .tasks import update_products
        if "manage.py" not in sys.argv and not self.run_already:
            self.run_already = True
            update_products()

However, this command is being executed twice (the update_products() call)

As stated in the documentation:

In the usual initialization process, the ready method is only called once by Django. But in some corner cases, particularly in tests which are fiddling with installed applications, ready might be called more than once. In that case, either write idempotent methods, or put a flag on your AppConfig classes to prevent re-running code which should be executed exactly one time.

I feel like I am following what the documentation says to do. What gives?

dmcmulle
  • 339
  • 3
  • 11
  • Have the same problem. Did you solve it? – Pavel Bernshtam Sep 02 '17 at 17:46
  • 1
    @PavelBernshtam, if I remember correctly, it was the gunicorn running multiple threads. When I changed to waitress for hosting, the problem went away. I didn't even include the 'run_already=False','and not self.run_already' code at all. – dmcmulle Sep 06 '17 at 20:34

5 Answers5

16

As stated on this answer, if you're running your app, using the python manage.py runserver command on Django, your application will run twice: One time to validate your models, and the other one to run your app.

You can change this passing the option --noreload to the runserver command.

Benjy Malca
  • 597
  • 1
  • 9
  • 21
4

On heroku, gunicorn is started with more than one gunicorn worker. Set the WEB_CONCURRENCY to 1:

heroku config:set WEB_CONCURRENCY=1

(see Basic configuration)

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
  • this isn't really a solution when you want multiple threads but need initialization code to run once. – jpro Dec 03 '20 at 18:40
1

Figured out that AppConfig is triggered twice and that causes the Scheduler to get initialized twice with this kind of a setup. Instead, instantiate the scheduler in url.py like this-

urlpatterns = [
    path('api/v1/', include(router.urls)),
    path('api/v1/login/', CustomTokenObtainPairView.as_view(), name='token_obtain_pair'),
    path('api/v1/login/refresh/', jwt_views.TokenRefreshView.as_view(), name='token_refresh'),
    path('api/v1/', include('rest_registration.api.urls'))
]

scheduler = BackgroundScheduler()
scheduler.add_job(task.run, trigger='cron', hour=settings.TASK_RUNNER_HOURS, minute=settings.TASK_RUNNER_MINUTES, max_instances=1)
scheduler.start()

This way the scheduler only instantiates once. Problem fixed.

Rajesh Panda
  • 636
  • 6
  • 10
0

No flag works on class level. Django is run twice on two separate processes. Class level variables on two separate processes are not visible each other. Use a flag from a database table as in this code (SchedulerUtils is a class written by me with a method go() that starts a backgroud apscheduler scheduler. The model uses a row in the table scheduler_schedulerinfo so you have to insert this row before: "INSERT INTO scheduler_schedulerinfo (started) values (0);"):

################################## APPS.PY
import os
from django.apps import AppConfig
from apscheduler.schedulers.background import BlockingScheduler, BackgroundScheduler
from scheduler.utils import SchedulerUtils

class SchedulerConfig(AppConfig):
    name = 'scheduler'

    def ready(self):
        startScheduler = True
        pid = os.getpid()

        #check i'm on heroku
        if (os.environ.get("DYNO")):
            # i'm on heroku, here runs twice
            print("[%s] DYNO ENV exists, i'm on heroku" % pid)
            from scheduler.models import SchedulerInfo
            schedInfo = SchedulerInfo.objects.all().first()
            if (schedInfo.started == 0):
                print("[%s] Scheduler not started, starting.... " % pid)
                startScheduler = True
                # set flag to 1
                SchedulerInfo.objects.all().update(started = 1)
            else:
                print("[%s] Scheduler already running, not starting." % pid)
                startScheduler = False # already running
                # reset to 0 for next time
                SchedulerInfo.objects.all().update(started = 0)

        # PRINT FLAG VALUE
        from scheduler.models import SchedulerInfo
        schedInfo = SchedulerInfo.objects.all().first()
        print("[%s] Value of flag schedulerinfo.started: %d" % (pid, schedInfo.started))

        if (startScheduler):
            su = SchedulerUtils()
            su.go()

##################################### MODELS.PY
from django.db import models

class SchedulerInfo(models.Model):
    started = models.IntegerField(default=0)
0

Another solution could be checking gunicorn pids as below:

import os
from django.apps import AppConfig
import psutil

class SchedulerConfig(AppConfig):
    name = 'scheduler'

    # I want to start ths scheduler only once,
    # if WEB_CONCURRENCY is set and is greater than 1
    # start the scheduler if the pid of this gunicorn is the same of the
    # maximum pid of all gunicorn processes
    def ready(self):
        startScheduler = True

        #check WEB_CONCURRENCY exists and is more than 1
        web_concurrency = os.environ.get("WEB_CONCURRENCY")
        if (web_concurrency):
            mypid = os.getpid()
            print("[%s] WEB_CONCURRENCY exists and is set to %s" % (mypid, web_concurrency))
            gunicorn_workers = int(web_concurrency)
            if (gunicorn_workers > 1):
                maxPid = self.getMaxRunningGunicornPid()
                if (maxPid == mypid):
                    startScheduler = True
                else:
                    startScheduler = False

        if (startScheduler):
            print("[%s] WILL START SCHEDULER", mypid)
        else:
            print("[%s] WILL NOT START SCHEDULER", mypid)

    def getMaxRunningGunicornPid(self):
        running_pids = psutil.pids()
        maxPid = -1
        for pid in running_pids:
            proc = psutil.Process(pid)
            proc_name = proc.name()
            if (proc_name == "gunicorn"):
                if (maxPid < pid):
                    maxPid = pid
        print("Max Gunicorn PID: %s", maxPid)
        return maxPid