19

I want use django ORM to finish count group by, but select sql unexpected limit 21 every time. I don't want the limit, why limit 21 appear, and how can I get all data?

model:

class Company(models.Model):
    company_no = models.CharField(primary_key=True, max_length=128)
    name = models.CharField(max_length=128)
    is_test = models.BooleanField(default=False)
    class Meta:
        db_table = 'company'

class User(models.Model):
    symbol = models.BigIntegerField(primary_key=True)
    is_test = models.BooleanField(default=False)

    class Meta:
        db_table = 'user'

class UserEmploy(models.Model):
    user = models.ForeignKey(User, on_delete=models.CASCADE, db_column='symbol', to_field='symbol', related_name='employ')
    company = models.ForeignKey(Company, on_delete=models.CASCADE, to_field='company_no', db_column='company_no', related_name='user_employ')
    class Meta:
        db_table = 'user_employ'

django code in my views:

   employ_qs_exclude_test = UserEmploy.objects\
        .exclude(user__is_test__in=utils.IS_TEST_MODE)\
        .values("company__name") \
        .annotate(employ_count=Count('user', distinct=True))\
        .order_by('company')

sql log:

SELECT `company`.`name`, COUNT(DISTINCT `user_employ`.`symbol`) AS `employ_count` FROM `user_employ`
INNER JOIN `user`
ON (`user_employ`.`symbol` = `user`.`symbol`)
INNER JOIN `company` 
ON (`user_employ`.`company_no` = `company`.`company_no`)
WHERE NOT (`user`.`is_test` IN (1))
GROUP BY `company`.`name`, `company`.`created_at`
ORDER BY `company`.`created_at` DESC LIMIT 21;
Iain Shelvington
  • 31,030
  • 3
  • 31
  • 50
ivy
  • 191
  • 1
  • 4
  • Could you also add the full django view here? – AKS Mar 22 '20 at 08:04
  • 5
    The 21 limit it usually applied when the query is printed (when `repr` is called on the `QuerySet`) so that the string representation is not ridiculously long for large queries – Iain Shelvington Mar 22 '20 at 08:06

1 Answers1

23

As suggested by Iain in a comment above, a LIMIT 21 is automatically added when taking the repr() of a queryset, which also happens implicitely when printing a queryset.

To get the full output, you can force the queryset into a list before printing, e.g. instead of print(qs) you would write print(list(qs)) (which shows all data, but omits the queryset classname).

The relevant code is here and here:

REPR_OUTPUT_SIZE = 20

def __repr__(self):
    data = list(self[:REPR_OUTPUT_SIZE + 1])
    if len(data) > REPR_OUTPUT_SIZE:
        data[-1] = "...(remaining elements truncated)..."
    return '<%s %r>' % (self.__class__.__name__, data)

Note that the LIMIT 21 will also be present on queries generated by calling .get() on a queryset. This is intended as a safeguard against the databasse returning a ton of data that will be thrown away anyway. The limit could have been 2, but using 21 means that the error can tell you how many records were actually found (provided it is less than 21). This does give some performance penalty, but only in the multiple records case, which should be an exceptional situation. When there is just one record there is no overhead.

See discussion of this here and code here:

MAX_GET_RESULTS = 21
Matthijs Kooijman
  • 2,498
  • 23
  • 30
  • 1
    I think 2 would be fine to report MultipleObjects in get() method. – Siraj Alam Jan 04 '22 at 20:10
  • This was a pure headache. The `LIMIT 21` even pops up when you print one of the elements of a queryset response. `>>> x[1]` where `x` is a queryset. For testing, I found printing `x[1].pk` was much helpful – Hritik Feb 12 '22 at 10:39
  • PR raised: https://github.com/django/django/pull/15431 – Siraj Alam Feb 15 '22 at 19:36
  • Thanks for the link. Seems the PR (changing the limit to 2) was closed, because the value of 21 was intentionally chosen. From the PR I followed some links and found the original discussion where they decided this value. I have updated my answer with the rationale and link accordingly. – Matthijs Kooijman Feb 18 '22 at 12:23