2

I have a postgresql database connected to with django. In database there is lots of big tables which I want their row count. Because of large size of tables this takes a lot of time to execute.

I founded that the approximate count could be retrieved from pg_class. Is there any way to this in Django and not executing raw query?

mohammad
  • 2,232
  • 1
  • 18
  • 38

1 Answers1

3

I propose to use a dedicated package for this named django-postgres-fuzzycount [GitHub]. This package provides a manager that will do the fast counting.

You can install the package with pip (for example in your local environment):

$ pip install django-postgres-fuzzycount

Then you can add the FuzzyCountManager to the models where you want to obtain an approximative count:

from django.db import models
from fuzzycount import FuzzyCountManager

class SomeModel(models.Model):

    #  ... (some fields) ...

    objects = models.Manager()
    approx = FuzzyCountManager()

and then you can count approximatively with:

SomeModel.approx.count()

In case you .filter(..), Django will calculate the real number, since the pg_class table only stores an apprixmative number of rows for the entire table, so:

SomeModel.approx.filter(foo=bar).count()

will take more time (depending on indexes, etc.).

You can also "patch" the objects manager directly, but then obtaining the real number of records might be harder:

from django.db import models
from fuzzycount import FuzzyCountManager

class SomeModel(models.Model):

    #  ... (some fields) ...

    objects = FuzzyCountManager()

It is also nice that if you change the backend database to another database, the FuzzyCountManager(..) will act as a normal Manager, so in case you later change the database system, you do not have to rewrite the managers.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • I don't want to "patch" the objects directly, because already have a custom manager, Added `approx = FuzzyCountManager()` but got the error: `AttributeError: 'QuerySet' object has no attribute 'approx'` – Miguel Mar 03 '21 at 12:16
  • @Miguel: you are likely working with `SomeModel.objects.approx`, but then it is a `QuerySet`, so that will not work. Note that `.objects` is only defined on the *model*, not on a manager or `QuerySet` that arises from that model. – Willem Van Onsem Mar 03 '21 at 12:21
  • @Miguel: you can however simply make a manager in the model `approx = FuzzyCountManager()`, and another `ojbects = models.Manager()`, then you thus can work with `SomeModel.approx` to count the number of elements approximately. – Willem Van Onsem Mar 03 '21 at 12:22
  • Ok, thanks. But what about count the queryset? That is what I need. – Miguel Mar 03 '21 at 12:56
  • @Miguel: you can count exactly with `SomeModel.objects.count()` or approximately with `SomeModel.approx.count()` if you implement it as in the first way (second code fragment). – Willem Van Onsem Mar 03 '21 at 13:01
  • Ok but I don't want to count the number of documents in the full collection, but only the documents I obtained from the query. – Miguel Mar 03 '21 at 15:33
  • @Miguel: but you can not count approximately a filtered set. The idea is that for an approximate set, PostgreSQLwill store the total size of the table (in another table), and update it regularly, but that of course does not work with an arbitrary filter. – Willem Van Onsem Mar 03 '21 at 15:37
  • Ok, thanks for clearing that out. A bit off-topic then, but any advice on what alternatives do I have to count an arbitrary filter? Because using `count` is really slow. – Miguel Mar 03 '21 at 15:41
  • @Miguel: usually that is slow if you filter on fields that have no index. In that case the database has to perform a linear search. Usually by adding indexes that can be optimized. – Willem Van Onsem Mar 03 '21 at 15:43
  • Already added. Even with that is quite slow. https://stackoverflow.com/questions/66457799/how-to-improve-count-of-text-query-for-django-with-postgres – Miguel Mar 03 '21 at 15:48