0

I have a MySQL table with 13M rows. I can query the db directly as

SELECT DISTINCT(refdate) FROM myTable

The query takes 0.15 seconds and is great.

The equivalent table defined as a Django model and queried as

myTable.objects.values(`refdate`).distinct()

takes a very long time. Is it because there are too many items in the list before distinct(). How do I do this in a manner that doesn't bring everything down?

Vishal
  • 2,097
  • 6
  • 27
  • 45
  • The SQL generated by your code above will be pretty much the same as what you run manually: `SELECT DISTINCT mytable.refdate FROM mytable` so it's weird that it runs so much slower. can you profile using django-debug-toolbar to make sure that it is the query itself that is so slow? – solarissmoke Oct 27 '15 at 04:58
  • I have tried using the django model from python shell and had the issue outlined above. I never even got to trying to load a page and watching it hang. – Vishal Oct 28 '15 at 00:43
  • 1
    Can you dump `connection.queries` after running the query in shell? (`from django.db import connection`)? – solarissmoke Oct 28 '15 at 04:13
  • @solarissmoke **Thank you!** That helped me fill in the blanks! – Vishal Oct 28 '15 at 09:10

2 Answers2

1

Thank you @solarissmoke for the pointer to connection.queries.

I was expecting to see

SELECT DISTINCT refdate FROM myTable

Instead, I got

SELECT DISTINCT refdate, itemIndex, itemType FROM myTable ORDER BY itemIndex, refdate, itemType. 

I then looked at myTable defined in models.py.

unique_together = (('nodeIndex', 'refdate', 'nodeType'), )
ordering = ['nodeIndex', 'refdate', 'nodeType']

From Interaction with default ordering or order_by

normally you won’t want extra columns playing a part in the result, so clear out the ordering, or at least make sure it’s restricted only to those fields you also select in a values() call.

So I tried order_by() to flush the previously defined ordering and voila!

myTable.objects.values('refdate').order_by().distinct()
Vishal
  • 2,097
  • 6
  • 27
  • 45
0

You can try this:

myTable.objects.all().distinct('refdate')
Nguyễn Hải Triều
  • 1,454
  • 1
  • 8
  • 14