0

Im trying to remove records from a table that have a duplicate value by their oldest timestamp(s), grouping by ID, so the results would be unique values per ID with the newest unique values per ID/timestamp kept, hopefully the below samples will make sense.

sample data:

id     value    timestamp
10     10       9/4/20 17:00
11     17       9/4/20 17:00
21     50       9/4/20 17:00
10     10       9/4/20 16:00
10     10       9/4/20 15:00
10     11       9/4/20 14:00
11     41       9/4/20 16:00
11     41       9/4/20 15:00
21     50       9/4/20 16:00

so id like to remove any values that have a dupliate value with the same id, keeping the newest timestamps, so the above data would become:

id     value    timestamp
10     10       9/4/20 17:00
11     17       9/4/20 17:00
21     50       9/4/20 17:00
10     11       9/4/20 14:00
11     41       9/4/20 16:00

EDIT:

query is just

SampleData.objects.all()
AlexW
  • 2,843
  • 12
  • 74
  • 156

1 Answers1

1

One approach could be using Subquery expressions as documented here.

Suppose your SampleData model looks like this:

class SampleData(models.Model):

    id2 = models.IntegerField()
    value = models.IntegerField()
    timestamp = models.DateTimeField()

(I replaced id by id2 to avoid conflicts with the model id).

Then you could delete your duplicates like this:

newest = SampleData.objects.filter(id2=OuterRef('id2'), value=OuterRef('value')).order_by('-timestamp')
SampleData.objects.annotate(newest_id=Subquery(newest.values('pk')[:1])).exclude(pk=F('newest_id')).delete()

Edit:

It seems as if MySQL has some issues handling deletions and subqueries, as documented in this SO post.

In this case a 2 step approach should help: First getting the ids of the objects to delete and then deleting them:

 newest = SampleData.objects.filter(id2=OuterRef('id2'), value=OuterRef('value')).order_by('-timestamp')
 ids2delete = list(SampleData.objects.annotate(newest_id=Subquery(newest.values('pk')[:1])).exclude(pk=F('newest_id')).values_list('pk', flat=True))
 SampleData.objects.filter(pk__in=ids2delete).delete()
Chris
  • 2,162
  • 1
  • 6
  • 17
  • when I try run the delete command I get django.db.utils.OperationalError: (1093, "You can't specify target table 'monitoring_data' for update in FROM clause") – AlexW Apr 17 '20 at 15:41
  • What Database are you using? – Chris Apr 17 '20 at 16:06
  • hi, using mysql – AlexW Apr 19 '20 at 09:46
  • Sorry, there I missed converting the ```values_list``` into a native python ```list```. See my update of the second approach. Tested it with MySQL and this should not throw that error – Chris Apr 20 '20 at 15:15