0

I have a model that looks like this:

class Keyword(models.Model):
    name = models.CharField(unique=True)

class Post(models.Model):
    title = models.CharField()
    keywords = models.ManyToManyField(
        Keyword, related_name="posts_that_have_this_keyword"
    )

Now I want to migrate all Posts of a wrongly named Keyword to a new properly named Keyword. And there are multiple wrongly named Keywords.

I can do the following but it leads to a number of SQL queries.

for keyword in Keyword.objects.filter(is_wrongly_named=True).iterator():
    old = keyword
    new, _ = Keyword.objects.get_or_create(name='some proper name')
    for note in old.notes_that_have_this_keyword.all():
        note.keywords.add(old)
    old.delete()

Is there a way I can achieve this while minimizing the SQL queries executed?

I prefer Django ORM solution to a raw SQL one, because I jumped right into the Django ORM without studying deep into SQL, not so familiar with SQL.

Thank you.

shinhong
  • 406
  • 4
  • 13

1 Answers1

1

If you want to perform bulk operations with M2M relationships I suggest that you act directly on the table that joins the two objects. Django allows you to access this otherwise anonymous table by using the through attribute on the M2M attribute on an object.

So, to get the table that joins Keywords and Posts you could reference either Keyword.posts_that_have_this_keyword.through or Post.keywords.through. I'd suggest you assign a nicely named variable to this like:

KeywordPost = Post.keywords.through

Once you get a hold onto that table bulk operations can be performed.

bulk remove bad entries

KeywordPost.objects.filter(keyword__is_wrongly_named=True).delete()

bulk create new entries

invalid_keyword_posts = KeywordPost.objects.filter(keyword__is_wrongly_named=True)
post_ids_to_update = invalid_keyword_posts.values_list("post_id", flat=True)
new_keyword_posts = [KeywordPost(post_id=p_id, keyword=new_keyword) for p_id in post_ids_to_update]
KeywordPost.objects.bulk_create(new_keyword_posts)

Basically you get access to all the features that the ORM provides on this join table. You should be able to achieve much better performance that way.

You can read up more on the through attribute here: https://docs.djangoproject.com/en/3.0/ref/models/fields/#django.db.models.ManyToManyField.through

Good luck!

Cameron Cairns
  • 173
  • 1
  • 7