2

Let's imagine a simple Food model with a name and an expiration date, my goal is to auto delete the object after the expiration date is reached.

I want to delete objects from the database (postgresql in my case) just after exp_date is reached, not filter by exp_date__gt=datetime.datetime.now() in my code then cron/celery once a while a script that filter by exp_date__lt=datetime.datetime.now() and then delete

Food(models.Model):
    name = models.CharField(max_length=200)
    exp_date = models.DateTimeField()

*I could do it with a vanilla view when the object is accessed via an endpoint or even with the DRF like so :

class GetFood(APIView):

    def check_date(self, food):
        """
       checking expiration date   
        """
       
       if food.exp_date <= datetime.datetime.now():
           food.delete()
           return False 

    def get(self, request, *args, **kwargs):

        id = self.kwargs["id"]

        if Food.objects.filter(pk=id).exists():

            food = Food.objects.get(pk=id)
            
            if self.check_date(food) == False:
               return Response({"error": "not found"}, status.HTTP_404_NOT_FOUND)
            else:
                name = food.name 
                return Response({"food":name}, status.HTTP_200_OK)
                
            
        else:
          return Response({"error":"not found"},status.HTTP_404_NOT_FOUND)        

but it would not delete the object if no one try to access it via an endpoint.

*I could also set cronjob with a script that query the database for every Food object which has an expiration date smaller than today and then delete themor even setup Celery. It would indeed just need to run once a day if I was using DateField but as I am using DateTimeField it would need to run every minute (every second for the need of ny project).

*I've also thought of a fancy workaround with a post_save signal with a while loop like :

@receiver(post_save, sender=Food)
def delete_after_exp_date(sender, instance, created, **kwargs):
    if created:
        while instance.exp_date > datetime.datetime.now():
            pass           
     
        else:
           instance.delete()

I don't know if it'd work but it seems very inefficient (if someone could please confirm)

Voila, thanks in advance if you know some ways or some tools to achieve what I want to do, thanks for reading !

Heroe__
  • 213
  • 5
  • 13
  • 1
    Is there a hard need to delete the object? Could the consuming parts of your application not just filter by exp_date__gt=datetime.datetime.now()? Otherwise put it on something that receives a https://docs.djangoproject.com/en/dev/ref/signals/#pre-init signal of an object (say the User one) - these fire constantly. To keep them sensible you could limit any db operation to a particular time (say only on every even second in a minute) – djangoat Dec 18 '20 at 15:33
  • If the "need of your project" is to run a delete every second, then run a delete every second. What is the issue? Does the delete take more than a second to run? Do you not know how to schedule things? Are you doubting whether your project really does need to do this? – jjanes Dec 18 '20 at 16:21
  • "Every second" means it would need to run every second to delete these objects immediately after exp_date is reached with 100/100 success rate. Where is the issue ? It would run every second for nothing even if no exp_date is going to expire in the next second, that's the issue and this topic is a way to determine how to auto delete an object after a certain date. I'm not doubting of anything and could use cron or celery to run a script every second without problem if that's really your concern. – Heroe__ Dec 18 '20 at 16:29
  • @Heroe__: automatically deleting will be more cumbersome. It means you somehow need something to "schedule" this, like a queueing mechanism. Even that will never be "exact", and even if you somehow would manage that, if you later alter the expiration time, then it will result in more trouble to "cancel" this and reschedule. The best way to implement this is to filter, you can do this more transparent with packages like [**`django-softdelete`**](https://github.com/scoursen/django-softdelete) and occasionally delete the objects effectively. – Willem Van Onsem Dec 18 '20 at 16:32
  • @WillemVanOnsem exact is of course a way to put it., more or less 1 second is fine in my case. I know about djabgo-sofdelete and don't really care about recovering data. Do you know a way to shedule a deletion after an object is created ? maybe I should just use post_save signal and add an individual task to celery queue everytime – Heroe__ Dec 18 '20 at 16:46
  • @Heroe__: but the `post_save` runs when you save the object. Not when you hit the expiration timestamp. Even if you somehow would run a thread that waits for the expiration time, then that thread would be "lost" if you stop the webserver, since that data is not persistent. If you use a persistent system like cron, then that other system should run (which is still a risk), but even if that was possible, if later the expiration date is set to something else, it will result in a lot of complex scenarios to cancel the current expiration date, and reschedule a new one. – Willem Van Onsem Dec 18 '20 at 16:49
  • @Heroe__: finally it is not efficient, since you each time use a query to remove a single object, often performance scales (more or less) in the *number* of queries, not that much what these queries do, since the overhead of constructing, sending, deserializing, serializing results, transferring over a network, etc. are quite large compared to the actual work the database management system itself does. – Willem Van Onsem Dec 18 '20 at 16:50
  • @WillemVanOnsem I mean use `post_save` to create a one-off task at the time of the individual expiration_date for each created object. This expiration_date isn't editable for what i want to do so that wouldn't be an issue. I'm more concerned about performance to be honnest – Heroe__ Dec 18 '20 at 16:57
  • @Heroe__: exactly but that `post_save` runs when you *save* the object. Now first of all signals do *not* run for all creation events, for example `bulk_create` will *not* trigger such signals in the first place (that is one of the many reasons why signals are an antipattern https://lincolnloop.com/blog/django-anti-patterns-signals/ ), but even if that was the case, spanning a thread will not work (since that is non-persistent), and if you use a scheduler service, you have to relie on an extra program which can fail, etc.). – Willem Van Onsem Dec 18 '20 at 17:00
  • @Heroe__: you thus make an "active" component, which often is more error-prone, since it depends on all services be active, having a synchornized clock, etc. whereas passive components do not require this. If you filter records, then even if somehow the cronjob fails, you still retrieve only non-expired food. – Willem Van Onsem Dec 18 '20 at 17:01
  • @WillemVanOnsem filtering is what I've always done for this type of need and is the obvious way to do it, that's the first thing I've precised in my post. But that's not what I want to do, I really need to delete from the database "almost" immediately after the exp_date is reached . But thanks anyway for pointing out what problems I'd potentially face with such an approach, i'm going to try to find solutions based on that. – Heroe__ Dec 18 '20 at 17:10

2 Answers2

5

I would advice not to delete the objects, or at least not effectively. Sceduling tasks is cumbersome. Even if you manage to schedule this, the time when you remove the items will always be slighlty off the time when you scheduled this from happening. It also means you will make an extra query per element, and not remove the items in bulk. Furthermore scheduling is inherently more complicated: it means you need something to persist the schedule. If later the expiration date of some food is changed, it will require extra logic to "cancel" the current schedule and create a new one. It also makes the system less "reliable": besides the webserver, the scheduler daemon has to run. It can happen that for some reason the daemon fails, and then you will no longer retrieve food that is not expired.

Therefore it might be better to combine filtering the records such that you only retrieve food that did not expire, and remove at some regular interval Food that has expired. You can easily filter the objects with:

from django.db.models.functions import Now

Food.objects.filter(exp_date__gt=Now())

to retrieve Food that is not expired. To make it more efficient, you can add a database index on the exp_date field:

Food(models.Model):
    name = models.CharField(max_length=200)
    exp_date = models.DateTimeField(db_index=True)

If you need to filter often, you can even work with a Manager [Django-doc]:

from django.db.models.functions import Now

class FoodManager(models.Manager):

    def get_queryset(*args, **kwargs):
        return super().get_queryset(*args, **kwargs).filter(
            exp_date__gt=Now()
        )

class Food(models.Model):
    name = models.CharField(max_length=200)
    exp_date = models.DateTimeField(db_index=True)
    
    objects = FoodManager()

Now if you work with Food.objects you automatically filter out all Food that is expired.

Besides that you can make a script that for example runs daily to remove the Food objects that have expired:

from django.db.models import Now

Food._base_manager.filter(exp_date__lte=Now()).delete()
Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
0

Update to the accepted answer. You may run into Super(): No Arguments if you define the method outside the class. I found this answer helpful.

As Per PEP 3135, which introduced "new super":

The new syntax:

super()

is equivalent to:

super(__class__, <firstarg>)

where class is the class that the method was defined in, and is the first parameter of the method (normally self for instance methods, and cls for class methods).

While super is not a reserved word, the parser recognizes the use of super in a method definition and only passes in the class cell when this is found. Thus, calling a global alias of super without arguments will not necessarily work.

As such, you will still need to include self:

class FoodManager(models.Manager):

    def get_queryset(self, *args, **kwargs):
        return super().get_queryset(*args, **kwargs).filter(
        exp_date__gt=Now()
    )

Just something to keep in mind.

Josh Crouse
  • 343
  • 1
  • 13