12

Running distinct() on any field of the comment model always returns all the records,

Comment.objects.values('user').distinct()

[{'user': 1}, {'user': 0}, {'user': 0}, {'user': 0}, {'user': 0}, {'user': 1}, {'user': 1}, {'user': 1}, {'user': 1}]

Comment.objects.values('ip_address').distinct()

[{'ip_address': u'127.0.0.1'},{'ip_address': u'192.168.0.180'}, {'ip_address':u'192.168.0.180'}, {'ip_address': u'192.168.0.180'}, {'ip_address': u'192.168.0. 180'}, {'ip_address': u'192.168.0.180'}, {'ip_address': u'192.168.0.180'}, {'ip_address': u'192.168.0.180'}, {'ip_address': u'192.168.0.180'}]

Why is this happening? Is there a way around this? Thanks!

ps: distinct() does run very well in different types of fields of a custom model during my test. Something special about the Comments framework?

Bit of conclusion Thanks everybody answering this question, combined with some reading I get conclusion as following:

  1. values() influences the lookup fields in SELECT part of the final sql("values() takes optional positional arguments, *fields, which specify field names to which the SELECT should be limited")
  2. order_by() adds its parameter to the SELECT part as well.
  3. using distinct() in a look up will result the sql to look like this:

    SELECT DISTINCT [fields1, fields2, fields3] FROM ... WHERE...

    and the values of the fields all together decides whether a record is unique. The fields may come from values() or order_by() functions in the lookup.

  4. So the order_by() is adding some unwanted effects when combined with distinct(), the fields specified in order_by is also take into consideration whether a record is unique

  5. Django Comment has a hidden order_by parameter by default, thus creating the whole problem. Any model has a hidden order_by when returning the qs can cause the same problem.

  6. The way of solving it is by adding an empty order_by() at the end of the lookup, which removes the default order_by.
Xun Yang
  • 4,209
  • 8
  • 39
  • 68

3 Answers3

21
Comment.objects.values('user').distinct().order_by()
DrTyrsa
  • 31,014
  • 7
  • 86
  • 86
  • 1
    if the OP is looking for distinct user IDs, both your solutions do not work at all – Simon Kagwi Jan 26 '12 at 12:41
  • @SimonKagwi And what do they return? – DrTyrsa Jan 26 '12 at 12:43
  • Hi DrTyrsa! I'm using django 1.3 and the parameter of distinct() is not documented in this version. The second command doesn't work for Comment model, but fine on a custom model(like what I've said in PS). But why 'values' shouldn't affect 'distinct' behavior? If you don't tell django which column's values should be used for grouping, how can it know in which way you want to treat records as unique? – Xun Yang Jan 26 '12 at 12:57
6

I haven't verified that this is the cause, but Comment model has a default ordering which influences distinct() method:

In [1]: print Comment.objects.values('ip_address').distinct().query
SELECT DISTINCT "django_comments"."ip_address", "django_comments"."submit_date" FROM "django_comments" ORDER BY "django_comments"."submit_date" ASC

It's a documented feature.

Now, how could it be that two comments have exactly the same timestamp? I suppose you're using MySQL which doesn't support anything less than a second.

And if you want to get rid of the default ordering, just do:

Comment.objects.order_by().values('ip_address').distinct()
Tomasz Zieliński
  • 16,136
  • 7
  • 59
  • 83
2

You can wrap your query in set;

distinct() does not go well with values() as per documentation

ip_sets = set(Comment.objects.order_by().values('ip_address'))
ip_list = list(set(Comment.objects.order_by().values('ip_address')))
7guyo
  • 3,047
  • 1
  • 30
  • 31