3

We have a Django application that uses Django-river for workflow management. For performance improvement, we had to use bulk_create. We need to insert data into a couple of tables and several rows in each. Initially, we were using the normal .save() method and the workflow was working as expected (as the post save() signals were creating properly). But once we moved to the bulk_create, the performance was improved from minutes to seconds. But the Django_river stopped working and there was no default post save signals. We had to implement the signals based on the documentation available.

class CustomManager(models.Manager):
    def bulk_create(items,....):
         super().bulk_create(...)
         for i in items:
              [......] # code to send signal

And

class Task(models.Model):
    objects = CustomManager()
    ....

This got the workflow working again, but the generation of signals is taking time and this destroys all the performance improvement gained with bulk_create. So is there a way to improve the signal creation?

More details

def post_save_fn(obj):
    post_save.send(obj.__class__, instance=obj, created=True) 

class CustomManager(models.Manager):
    def bulk_create(self, objs, **kwargs):
        #Your code here
        data_obj = super(CustomManager, self).bulk_create(objs,**kwargs)
        for i in data_obj:
            # t1 = threading.Thread(target=post_save_fn, args=(i,))
            # t1.start()
            post_save.send(i.__class__, instance=i, created=True) 
        return data_obj
        
        
class Test(Base): 
    test_name = models.CharField(max_length=100)
    test_code = models.CharField(max_length=50)
    objects = CustomManager()
    class Meta:
        db_table = "test_db"
kallada
  • 1,829
  • 4
  • 33
  • 64
  • 1
    I think it depends on what your post_save signals do. Maybe instead of calling the callback function for each item, you can create a function that does all it's needed in one function call. Can you post your signals? – lucutzu33 Oct 03 '21 at 15:16
  • @Ene I have added a sample snippet. Thank you – kallada Oct 03 '21 at 17:32
  • 1
    Sorry, I meant the receiver functions. – lucutzu33 Oct 03 '21 at 17:45
  • @Ene The Django_river is the receiver, I don't have much control over it. But the generation of signals is very costly for me. Thank you – kallada Oct 03 '21 at 18:35
  • 2
    It's not the generation of signals that is costly. The moment you "generate signal" it is being processed by `django-river` at this exact time. You're doing here almost exactly what calling `.save()` on individual instances would do (except single call to `UPDATE` vs multiple). – Krzysztof Szularz Oct 06 '21 at 08:36
  • 1
    seems like an issue on the receiver end. FTTB, update your django-river's receiver function with a bare minimum receiver function and check the execution speed. If you are getting as much speed as you want, the issue lies in the receiver section – JPG Oct 07 '21 at 01:30
  • 1
    If the signal handler makes database queries, then this will indeed slow the application down comparable to no `bulk_create`, simply because it generates an *N+1* problem again. This is one of the many reasons why signals are not very useful. – Willem Van Onsem Oct 08 '21 at 16:12

2 Answers2

4

What is the problem?

As others have mentioned in the comments, the problem is that the functions that are getting called via the post_save are taking a long time. (Remember that signals are not async!! - this is a common misconception).

I'm not familiar with django-river but taking a quick look at the functions that will get called post-save (see here and here) we can see that they involve additional calls to the database.

Whilst you save a lot of individual db hits by using bulk_create you are still doing calling the database again multiple times for each post_save signal.

What can be done about it?

In short. Not much!! For the vast majority of django requests, the slow part will be calling the database. This is why we try and minimise the number of calls to the db (using things like bulk_create).

Reading through the first few paragraphs of django-river the whole idea is to move things that would normally be in code to the database. The big advantage here is that you don't need to re-write code and re-deploy so often. But the disadvantage is that you're inevitably going to have to refer to the database more, which is going to slow things down. This will be fine for some use-cases, but not all.

There are two things I can think of which might help:

  • Does all of this currently happen as part of the request/response cycle. And if it is, does it need to be? If the answers to these two questions are 'yes' and 'no' respectively, then you could move this work to a separate task queue. This will still be slow, but at least it won't slow down your site.
  • Depending on exactly what your workflows are and the nature of the data you are creating, it might be the case that you can do everything that the post_save signals are doing in your own function, and do it more efficiently. But this will definitely depend upon your data, and your app, and will move away from the philosophy of django-river.
tim-mccurrach
  • 6,395
  • 4
  • 23
  • 41
1

Use a separated worker if the "signal" logic allows you to be executed after the bulk save.

You can create an additional queue table and put the metadata about what to do for your future worker.

Create a separated worker (Django module) with needed logic and data from the queue table. You can do it as management command, this will allow you to run the worker in the main flow (you can run management commands from regular Django code) or you can run it by crontab based on a schedule.

How to run such a worker?

If you need something to be done as closely as you've created records - run it in a separate thread using the threading module. So your request-response lifecycle will be done right after you've started a new thread.

Else if you can do it later - make a schedule and run it by crontab using the management command framework.

fanni
  • 1,149
  • 8
  • 11