1

I have this class name: Maybe it is the

class Follower(models.Model):
    language = models.ManyToManyField(
        "instagram_data.Language", verbose_name=_("language_code_name"), blank=True)
class Language(models.Model):
    code_name = models.CharField(max_length=6, null=True, blank=True)

    def __str__(self):
        return self.code_name

While searching for specific name it return values fast (few sec), (i.g Follower_obj.language.code_name = "he"): he lang

While adding the check for Follower_obj.language.code_name != "ar", it can take few minutes !

Maybe design issue ?

EDIT: Running Explain: So I first minimized the queryset by filter _is_verified than run the slow language check:

        qs1 = Follower.objects.filter(Q(is_verified=True))
        print(qs1.explain())

        qs2 = qs1.filter(~Q(language__code_name='ar'))
        print(qs2.explain())
Gather  (cost=1000.00..1216249.91 rows=7960 width=979)
  Workers Planned: 2
  ->  Parallel Seq Scan on instagram_data_follower  (cost=0.00..1214453.91 rows=3317 width=979)
        Filter: is_verified
Gather  (cost=6135.18..92362779347.15 rows=3980 width=979)
  Workers Planned: 2
  ->  Parallel Seq Scan on instagram_data_follower  (cost=5135.18..92362777949.15 rows=1658 width=979)
        Filter: (is_verified AND (NOT (SubPlan 1)))
        SubPlan 1
          ->  Materialize  (cost=5135.18..56725.50 rows=157291 width=4)
                ->  Nested Loop  (cost=5135.18..55324.05 rows=157291 width=4)
                      ->  Seq Scan on instagram_data_language u2  (cost=0.00..1.64 rows=1 width=4)
                            Filter: ((code_name)::text = 'ar'::text)
                      ->  Bitmap Heap Scan on instagram_data_follower_language u1  (cost=5135.18..53615.64 rows=170677 width=8)
                            Recheck Cond: (language_id = u2.id)

SQL EXECUTED:

SELECT * FROM "instagram_data_follower" WHERE NOT ("instagram_data_follower"."id" IN (SELECT U1."follower_id" FROM "instagram_data_follower_language" U1 INNER JOIN "instagram_data_language" U2 ON (U1."language_id" = U2."id") WHERE U2."code_name" = ar)))

What am I see here?

EDIT:

Lookup is faster by searching id in Follower instead of char. but still not fast as I want it to be. Not sure what "not-equal" best design optimization is.

flice com
  • 125
  • 7

2 Answers2

1

You need to get SQL explanation for your queries, look at this pseudocode:

qs1 = (Follower_obj.language.code_name = "he")
qs1.explain()

qs2 = (Follower_obj.language.code_name != "ar")
qs2.explain()

see EXPLAIN for info. Possibly you might need .prefetch_related('language') optimization, but better check Django DB optimization docs first.

frost-nzcr4
  • 1,540
  • 11
  • 16
  • Added Explain results – flice com Apr 18 '21 at 08:49
  • 1
    Now we can see the nested loop that's perfect for a bottleneck. Since there are only a few languages they could be prefetched to optimize the nested loops. Also you should try to get that_language_id for "ar" and then run `qs1.filter(~Q(language_id=that_language_id))` to avoid joining the tables and getting the loops. – frost-nzcr4 Apr 18 '21 at 21:04
  • @frost-nzcr4 how do we get the id out of the name ? where is it stored ? And let me get it - the issue is that each query need to check the != to the char 'ar' so it need to retrieve the language object each time for each row and check the char? why is it different from == isn't that the same case? print(Language.objects.all()) return – Sion C Apr 19 '21 at 21:37
  • print(Language.objects.all()) return , , , so 'ar' is 1 ? – Sion C Apr 19 '21 at 21:43
  • Is your Language table has only unique languages? Because code_name doesn't primary key or unique it could not be optimized for scan. An id field https://docs.djangoproject.com/en/3.2/topics/db/models/#automatic-primary-key-fields is created automatically by Django and you can get optimization using it in filter. Try primary_key=True or uniqie=True. – frost-nzcr4 Apr 20 '21 at 09:44
  • Is your Language table has only unique languages? YES I guess I can summarize to 130 options. But unique will leave me with 1 language per followers no ? will the search for id instead of char_code for sure make it faster ? since I still see slow search – flice com Apr 22 '21 at 12:09
  • Please see sql executed. I think the id is not the issue – flice com Apr 22 '21 at 19:08
  • 1
    You have select within select. Try to optimize it with `distinct` to avoid follower_id duplications. Some RDBMS have optimizations for nested selects but for the short one like 10, 100, maybe 1000 rows, but when you go through that boundaries the nested select will be triggered on each row. – frost-nzcr4 Apr 22 '21 at 23:33
  • 1
    To avoid this you might create temporary table with only followers without "ar" language like this `CREATE TEMPORARY TABLE IF NOT EXISTS followers_without_ar_language AS (SELECT DISTINCT idfl.follower_id FROM instagram_data_follower_language idfl INNER JOIN "instagram_data_language" idl ON idl.id = idfl.language_id WHERE idl.code_name != "ar" GROUP BY idfl.follower_id`) and then LEFT JOIN instagram_data_follower to each row. This kind of magic possible via raw Django queries https://docs.djangoproject.com/en/3.2/topics/db/sql/ – frost-nzcr4 Apr 22 '21 at 23:33
  • How do I add a custom SQL to django admin is it only in get_queryset function ? – Sion C Apr 27 '21 at 05:03
  • It's worth asking a new question. – frost-nzcr4 Apr 27 '21 at 09:28
-1

You can use Q objects for this. They can be negated with the ~ operator and combined much like normal Python expressions:

from django.db.models import Q
objs = Language.objects.filter(~Q(code_name = 'ar' ))

more info is here

Saeed Ramezani
  • 462
  • 6
  • 20