8

I am trying to create a song-artist-album relationship in Django. I have the following models:

class Artist(models.Model):
    gid = models.CharField(max_length=63, blank=True)
    name = models.CharField(max_length=255, blank=True)
    begin_life = models.CharField(max_length=31, blank=True)
    end_life = models.CharField(max_length=31, blank=True)
    type = models.CharField(max_length=1, blank=True)
    gender = models.CharField(max_length=1, blank=True)

class Song(models.Model):
    gid = models.CharField(max_length=63, blank=True)
    title = models.CharField(max_length=255, blank=True)
    artist = models.ForeignKey('Artist', related_name='songs_artist')
    album = models.ForeignKey('Album', related_name='songs_album')
    length = models.IntegerField(default=0)

I have created my ArtistSerializer so that I can retrieve all the songs of the artist when I get the info of any particular artist. This is the serializer I have created:

class ArtistSerializer(serializers.ModelSerializer):
    songs_artist = SongSerializer(source='songs_artist')
    class Meta:
        model = Artist
        fields = ('name', 'type', 'gender', 'begin_life', 'end_life', 'songs_artist')

class SongSerializer(serializers.ModelSerializer):
    artist = SongArtistSerializer()
    album = SongAlbumSerializer()
    class Meta:
        model = Song
        fields = ('id', 'title', 'artist', 'album', 'length')

class SongArtistSerializer(serializers.ModelSerializer):
    class Meta:
        model = Artist
        fields = ('id', 'name')

A quick profiling on the GET method of my artist revealed some troubling facts. Following are the results of the profiling ordered by time and number of calls: http://pastebin.com/bwcKsn2i.

But, when I removed the songs_artist field from my serializer, following was the output of the profiler: http://pastebin.com/0s5k4w7i.

If I read right, the database is being hit 1240 times when I use source!

Is there any other alternative to doing this?

Thanks in advance.

R4chi7
  • 853
  • 1
  • 11
  • 36

2 Answers2

5

Django REST Framework will not optimize your queries for you, it's up to you to decide how best to remove any N+1 queries. You should follow the guidelines covered in the Django documentation to handle performance issues.

In the case of ForeignKey relationships, you should use select_related in your query, which will pre-fetch the objects in the original query.

In the case of ManyToMany and GenericForeignKey relationships, you should use prefetch_related. I've written quite a bit about this in another Stack Overflow answer, but the gist is that you use it similar to select_related.

You should override the query in get_queryset on the view for best results, as you don't need to worry about Django REST Framework incorrectly cloning the queryset when used as an attribute on the class.

Community
  • 1
  • 1
Kevin Brown-Silva
  • 40,873
  • 40
  • 203
  • 237
  • Thanks for the reply! I know about the `select_related` and similar methods. But how do I use them while serializing my data? Is there any alternative to this other than putting the `source`? – R4chi7 Nov 06 '14 at 17:02
  • 2
    @R4chi7 When you are setting up the queryset for the serializer, that is where you should be using `select_related` and `prefetch_related`. The serializers do not form their own queries, so this will help reduce the number of N+1 queries. – Kevin Brown-Silva Nov 06 '14 at 17:08
  • that makes more sense. Accepting your answer for now. :-) – R4chi7 Nov 06 '14 at 18:49
0

Serialising the data using DRF does not always guarantee for the optimised Database hits and one has to put some efforts to make minimal hits. We can consider some cases where we can decide what option would work best for our case:

  • ModelSerializer
  • ReadOnlyModelSerializer
  • Serializer
  • ReadOnlySerializer

Here's a nice description of above mentioned use cases:

https://hakibenita.com/django-rest-framework-slow

Talat Parwez
  • 129
  • 4