0

I have the following table in my Django Models:

Date         | Field1 | Field2 | Field3 | Field4 | Field5 | Field6 |

05/01/2018   |      7 |      1 |      8 |      4 |      9 |      2 |
05/02/2018   |      9 |      2 |      1 |      1 |      6 |      2 |
05/03/2018   |      8 |      1 |      3 |      1 |      5 |      1 |
05/04/2018   |      7 |      1 |      8 |      4 |      9 |      2 |
05/05/2018   |      7 |      1 |      8 |      4 |      9 |      2 |

I would like to return the names of the top 3 fields with the highest sum over a custom time period.

For example if I want to get the NAMES of the Top 3 fields with the highest sum between 05/02/2018 and 05/04/2018, this should return the following:

'Field1, Field5, Field3'

because:

Date         | Field1 | Field2 | Field3 | Field4 | Field5 | Field6 |

05/02/2018   |      9 |      2 |      1 |      1 |      6 |      2 |
05/03/2018   |      8 |      1 |      3 |      1 |      5 |      1 |
05/04/2018   |      7 |      1 |      8 |      4 |      9 |      2 |

SUM          |     24 |      4 |     12 |      6 |     20 |      5 |

I could do the following:

1) Return a list of field names (attributes of my Django model's class) using __dict__

2) Loop through all field names, sum up their values in the table by filtering for dates and using .aggregate(SUM()) and put my results into a dictionary of {FieldNames: Sums}

3) And finally sort the dictionary out by values (Sums) and extract the keys (Field Names) corresponding to the top 3 values.

This kind of seems like it would be overkill and I'am wondering if there is a more Pythonic/Django-esque way of achieving this using Querysets?

Lawless Leopard
  • 59
  • 2
  • 10
  • 1
    I think that is bad design to store it in different fields then. In that case it would probably be better to *unpivot* the data. The idea of columns is actually that those are *orthogonal* data elements, so that one column typically does *not* (or very limited) interacts with another column. – Willem Van Onsem Jul 13 '18 at 14:57
  • @WillemVanOnsem Thank you for your input, I agree with you unfortunately I am dealing with legacy code here, and one of the requirements for the job at hand is to keep the Data Base format as is and to not alter it under any circumstance. I just have to work around the current DB format. – Lawless Leopard Jul 13 '18 at 15:06

1 Answers1

0

I believe that conditional aggregation could solve this; you would aggregate the sum of each field something like...

https://docs.djangoproject.com/en/2.0/ref/models/conditional-expressions/#case

from django.db.models import Case, When, IntegerField
from django.db.models.functions import Cast

uglyquery = ModelName.objects.aggregate(
     field_name_sum=Sum(
         Case(
             When(
                 Q(timestamp__lte=date.today() & Q(timestamp __gte=start_date)),
                 then=Cast('field_name',IntegerField()) # pretty sure the Cast is necessary
             ), output_field= IntegerField())
         )
     ),
     field_2_sum=....
 )

will return a dictionary of all of your fields and sums which then can be sorted.

Now, because we can't sort a dict in python, from operator import itemgetter and we have...

How do I sort a dictionary by value?

sorted(uglyquery.items(), key=itemgetter(1))[3:6] 
# slice the first three results and leave only the remaining that we are interested in

telling the sorted function to sort by our sum on each field, this method returns the fields in ascending order as a list of tuple ie [('Field2',4), ('Field6',5), ...]

Because we sliced the lowest values, we are left with the three fields with the highest sum of the 6 in that date range. :)