0

So I want to get the latest result for a list of values from the database.

Example:

Task
col1 col2
1   1
2   12
3   32
4   1
5   24
1   25
2   62
3   7
2   81
1   9       -> last occurence for '1' in 'col1'
4   10      -> last occurence for '4' in 'col1'
3   121
5   12      -> last occurence for '5' in 'col1'

There's a list 'z' and I want to write query such a way that I get the latest results from the table 'Task' such that z exists in col1. Assuming 'z' is:

z = [1, 4, 5]

I want the end result as:

col1 col2
1   9
4   10
5   12

There are 2 solutions that I came up with. Solution1 is as follows:

all_results = Task.objects.filter(col1__in=z)
results = dict()
for result in results:
    results[result.col1] = result

Solution2 is as follows:

results = dict()
for x in z:
    results[x] = Task.objects.filter(col1=x).reverse()[0]

But I'm wondering if there's a way to combine both solutions together such that I only have to make one database call and get results for each distinct 'col1'. Reason: My database is very big and currently I've implemented Solution2 which is making the entire process very slow. Solution1 is not feasible since there are multiple entries and 'results' dictionary is going to be very big and hence it'll be very time consuming.

Shrikant Kakani
  • 1,511
  • 2
  • 17
  • 37
  • It seems you are concerned about execution speed and space. I think this could be one way: If you know the frequency of `Task` table update, then fetching the last `n` rows (instead of filtering the ones in `z`) and doing a lookup operation on those rows if they are in `z`. If its able to find all the elements in `z` then you are good to go. If it couldn't find then you may want to increase `n` to `n'` and fetch other `n'` rows. It will save you space and time. But the catch here is the occurrence of items in `z` are frequently occuring enough. – stuartnox Sep 13 '16 at 23:21
  • @stuartnox the size of the table is in thousands if not millions. Also, the table is updated quite frequently, hence what value needs to be assigned to 'n' will be a big discussion. Also, in the above example z can also hold values such as [6, 7] and you can see that both [6, 7] does not exists in the table. Hence I'll keep updating 'n' until I reach the end. – Shrikant Kakani Sep 13 '16 at 23:32
  • If you don't have another key on which to order the columns (e.g. updated at), I don't believe table order is guaranteed. http://stackoverflow.com/questions/20050341/when-no-order-by-is-specified-what-order-does-a-query-choose-for-your-record – Alexander Sep 14 '16 at 00:01
  • see http://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group , don't think django ORM supports such queries, would need to run them in raw sql most likely – serg Sep 14 '16 at 02:51

1 Answers1

0

You may achieve this result from within the QuerySet. Your ORM query will be:

Task.objects.filter(col1__in=z).values('col1').annotate(latest_record=Max('col2'))
# where z = [1, 4, 5]
Moinuddin Quadri
  • 46,825
  • 13
  • 96
  • 126
  • The results do not depend on value of 'col2'. I've edited my question, hope that helps. and thanks for your answer. I'll try to emulate and will see if I can come up with the final intended answer. – Shrikant Kakani Sep 13 '16 at 23:45