1

I am working on a personal project and I am trying to write a complex query that:

  1. Gets every device that belongs to a certain user

  2. Gets every sensor belonging to every one of the user's devices

  3. Gets the last recorded value and timestamp for each of the user's devices sensors.

I am using Sqlite, and I managed to write the query as plain SQL, however, for the life of me I cannot figure out a way to do it in django. I looked at other questions, tried going through the documentation, but to no avail.

My models:

class User(AbstractBaseUser):
    email = models.EmailField()

class Device(models.Model):
    user = models.ForeignKey(User)
    name = models.CharField()

class Unit(models.Model):
    name = models.CharField()

class SensorType(models.Model):
    name = models.CharField()
    unit = models.ForeignKey(Unit)

class Sensor(models.Model):
    gpio_port = models.IntegerField()
    device = models.ForeignKey(Device)
    sensor_type = models.ForeignKey(SensorType)

class SensorData(models.Model):
    sensor = models.ForeignKey(Sensor)
    value = models.FloatField()
    timestamp = models.DateTimeField()

And here is the SQL query:

SELECT acc.email, 
           dev.name as device_name, 
           stype.name as sensor_type,
           sen.gpio_port as sensor_port,
           sdata.value as sensor_latest_value, 
           unit.name as sensor_units, 
           sdata.latest as value_received_on
FROM devices_device as dev
INNER JOIN accounts_user  as acc on dev.user_id = acc.id
INNER JOIN devices_sensor  as sen on sen.device_id = dev.id
INNER JOIN devices_sensortype as stype on stype.id = sen.sensor_type_id
INNER JOIN devices_unit as unit on unit.id = stype.unit_id
LEFT JOIN (
            SELECT MAX(sd.timestamp) latest, sd.value, sensor_id
            FROM devices_sensordata as sd
            INNER JOIN devices_sensor as s ON s.id = sd.sensor_id
        GROUP BY sd.sensor_id) as sdata on sdata.sensor_id= sen.id
WHERE acc.id = 1
ORDER BY dev.id

I have been playing with the django shell in order to find a way to implement this query with the QuerySet API, but I cannot figure it out...

The closest I managed to get is with this:

>>> sub = SensorData.objects.values('sensor_id', 'value').filter(sensor_id=OuterRef('pk')).order_by('-timestamp')[:1]
>>> Sensor.objects.annotate(data_id=Subquery(sub.values('sensor_id'))).filter(id=F('data_id')).values(...)

However it has two problems:

  1. It does not include the sensors that do not yet have any values in SensorsData
  2. If i include the SensorData.values field into the .values() I start to get previously recorded values of the sensors

If someone could please show me how to do it, or at least tell me what I am doing wrong I will be very grateful!

Thanks!

P.S. Please excuse my grammar and spelling errors, I am writing this in the middle of the night and I am tired.

EDIT: Based on the answers I should clarify: I only want the latest sensor value for each sensor. For example I have In sensordata:

id | sensor_id | value | timestamp|
1  |  1             |  2       |  <today>   |
2  |  1             |  5       | <yesterday>|
3  |  2             |  3       | <yesterday>|

Only the latest should be returned for each sensor_id:

id |   sensor_id    |   value  |  timestamp |
1  |  1             |  2       |  <today>   |
3  |  2             |  3       | <yesterday>|

Or if the sensor does not yet have any data in this table, I waant the query to return a record of it with "null" for value and timestamp (basically the left join in my SQL query).

EDIT2:

Based on @ivissani 's answer, I managed to produce this:

>>> latest_sensor_data = Sensor.objects.annotate(is_latest=~Exists(SensorData.objects.filter(sensor=OuterRef('id'),timestamp__gt=OuterRef('sensordata__timestamp')))).filter(is_latest=True)
>>> user_devices = latest_sensor_data.filter(device__user=1)
>>> for x in user_devices.values_list('device__name','sensor_type__name', 'gpio_port','sensordata__value', 'sensor_type__unit__name', 'sensordata__timestamp').order_by('device__name'):
...     print(x)

Which seems to do the job.

This is the SQL it produces:

    SELECT
  "devices_device"."name",
  "devices_sensortype"."name",
  "devices_sensor"."gpio_port",
  "devices_sensordata"."value",
  "devices_unit"."name",
  "devices_sensordata"."timestamp"
FROM
  "devices_sensor"
  LEFT OUTER JOIN "devices_sensordata" ON (
    "devices_sensor"."id" = "devices_sensordata"."sensor_id"
  )
  INNER JOIN "devices_device" ON (
    "devices_sensor"."device_id" = "devices_device"."id"
  )
  INNER JOIN "devices_sensortype" ON (
    "devices_sensor"."sensor_type_id" = "devices_sensortype"."id"
  )
  INNER JOIN "devices_unit" ON (
    "devices_sensortype"."unit_id" = "devices_unit"."id"
  )
WHERE
  (
    NOT EXISTS(
      SELECT
        U0."id",
        U0."sensor_id",
        U0."value",
        U0."timestamp"
      FROM
        "devices_sensordata" U0
      WHERE
        (
          U0."sensor_id" = ("devices_sensor"."id")
          AND U0."timestamp" > ("devices_sensordata"."timestamp")
        )
    ) = True
    AND "devices_device"."user_id" = 1
  )
ORDER BY
  "devices_device"."name" ASC
Slav
  • 147
  • 2
  • 13
  • This post can help too https://stackoverflow.com/questions/48128714/how-to-make-an-inner-join-in-django – Geancarlo Murillo Apr 12 '19 at 01:41
  • What do you want to do with the data? Do you need them in a flat table (think Excel) or as nested objects (in a tree-like structure)? – Endre Both Apr 12 '19 at 06:26
  • Both should be fine. I am going to use them in a view and display them to the user. Whichever is more robust. – Slav Apr 12 '19 at 11:21

4 Answers4

0

For this kind of queries, I recommend strongly to use Q objects, here the docs https://docs.djangoproject.com/en/2.2/topics/db/queries/#complex-lookups-with-q-objects

Geancarlo Murillo
  • 509
  • 1
  • 5
  • 14
0

Something like this?:

Multiple Devices for 1 User

device_ids = Device.objects.filter(user=user).values_list("id", flat=True)
SensorData.objects.filter(sensor__device__id__in=device_ids
                          ).values("sensor__device__name", "sensor__sensor_type__name", 
                                   "value","timestamp").order_by("-timestamp")

1 Device, 1 User

SensorData.objects.filter(sensor__device__user=user
                          ).values("sensor__device__name", "sensor__sensor_type__name", 
                                   "value", "timestamp").order_by("-timestamp")

That Queryset will:

1.Gets every device that belongs to a certain user

2.Gets every sensor belonging to every one of the user's devices (but it return sensor_type every sensor cause there is no name field there so i return sensor_type_name)

3.Gets all recorded(order by the latest timestamp) value and timestamp for each of the user's devices sensors.

UPDATE

try this:

list_data=[]
for _id in device_ids:
    sensor_data=SensorData.objects.filter(sensor__device__user__id=_id)
    if sensor_data.exists():
        data=sensor_data.values("sensor__id", "value", "timestamp", "sensor__device__user__id").latest("timestamp")
        list_data.append(data)
dpht
  • 156
  • 4
  • This is close, however, I only need the last value of each sensor. Please see my the edit on my question. – Slav Apr 12 '19 at 11:10
0

It's perfectly fine to execute raw queries with django, especially if they are that complex.

If you want to map the results to models use this : https://docs.djangoproject.com/en/2.2/topics/db/sql/#performing-raw-queries

Otherwise, see this : https://docs.djangoproject.com/en/2.2/topics/db/sql/#executing-custom-sql-directly

Note that in both cases, no checking is done on the query by django. This means that the security of the query is your full responsability, sanitize the parameters.

Loïc
  • 11,804
  • 1
  • 31
  • 49
  • This is how I have implemented it now, but I would like to avoid using raw queries for the reasons you have specified yourself. – Slav Apr 12 '19 at 11:15
  • if ```acc.id``` is the only parameter, just cast it to integer using ```int()``` that will prevent any SQL Injection – Loïc Apr 12 '19 at 11:46
0

Actually your query is rather simple, the only complex part is to establish which SensorData is the latest for each Sensor. I would go by using annotations and an Exists subquery in the following way

latest_data = SensorData.objects.annotate(
    is_latest=~Exists(
        SensorData.objects.filter(sensor=OuterRef('sensor'),
                                  timestamp__gt=OuterRef('timestamp'))
    )
).filter(is_latest=True)

Then it's just a matter of filtering this queryset by user in the following way:

certain_user_latest_data = latest_data.filter(sensor__device__user=certain_user)

Now as you want to retrieve the sensors even if they don't have any data this query will not suffice as only SensorData instances are retrieved and the Sensor and Device must be accessed through fields. Unfortunately Django does not allow for explicit joins through its ORM. Therefore I suggest the following (and let me say, this is far from ideal from a performance perspective).

The idea is to annotate the Sensors queryset with the specific values of the latest SensorData (value and timestamp) if any exists in the following way:

latest_data = SensorData.objects.annotate(
    is_latest=~Exists(
        SensorData.objects.filter(sensor=OuterRef('sensor'),
                                  timestamp__gt=OuterRef('timestamp'))
    )
).filter(is_latest=True, sensor=OuterRef('pk'))

sensors_with_value = Sensor.objects.annotate(
    latest_value=Subquery(latest_data.values('value')),
    latest_value_timestamp=Subquery(latest_data.values('timestamp'))
)  # This will generate two subqueries...

certain_user_sensors = sensors_with_value.filter(device__user=certain_user).select_related('device__user')

If there aren't any instances of SensorData for a certain Sensor then the annotated fields latest_value and latest_value_timestamp will simply be set to None.

ivissani
  • 2,614
  • 1
  • 18
  • 12
  • Ok, so I've missed that you wanted to get the sensors even if they do not have data yet, let me elaborate on this a little bit – ivissani Apr 12 '19 at 11:35
  • You sir, are a ROCKSTAR! Thank you! Your code works, except for the sensors without data, but I think I managed to modify your answer to fix this. Please see the edit on my question. – Slav Apr 12 '19 at 12:31
  • Oh, I just saw that you posted an elaboration. I think I managed to find another way, I wrote it as edit2 on my question. I do not use subqueries, what do you think? – Slav Apr 12 '19 at 12:34
  • Not quite sure your second edit does what you want... cause there you don't have a reference to the actual latest `SensorData` instance. Did you change your models? because I see you used `sensordata` expansion from your `Sensor` model but I fail to see what it refers to as the relation (`ForeignKey`) goes in the opposite direction... – ivissani Apr 12 '19 at 12:37
  • Correction, I'm quite sure that your second edit is NOT doing what you want... that query will return multiple records for the same sensor, one for each `SensorData` instance there is for that sensor. Not only for the latest one. – ivissani Apr 12 '19 at 12:45
  • Hm, that is odd. My original SQL query, and my edit2 produce the same results. I even tried to add another value for one of the sensors. Both queries updated and showed only the newest value. I have not changed my models, however it was late when I posted the question, so I might have missed something. I will check them later and I will post again if there is a correction. EDIT: Also ran the generated query from django directly in the db. Results are in different order, but otherwise the same. – Slav Apr 12 '19 at 13:50
  • Clearly the SQL you posted is fine, what I struggle to see is how it was produced. You used the `sensordata__` expansion but as far as I can tell that should not work in your models as they are in your post... yet I may be wrong on this one... – ivissani Apr 12 '19 at 14:20
  • Okay, I copy-pasted the relevant info about the models in my app in the question. I also tried to run "makemigrations" to make sure there aren't any changes to them, and there weren't. I do not think there was a difference with the ones I posted originally, but you can take a look again if you like. Anyway I think the problem is solved for now. If this turns out to not be the case at a later time, I will reopen the question. Thank you again for your help! – Slav Apr 12 '19 at 15:36