7

I have two unrelated tables with the same primary key.

ip            mac
11.11.11.11   48-C0-09-1F-9B-54
33.33.33.33   4E-10-A3-BC-B8-9D
44.44.44.44   CD-00-60-08-56-2A
55.55.55.55   23-CE-D3-B1-39-A6

ip            type     owner
22.22.22.22   laptop   John Doe
33.33.33.33   server   XYZ Department
44.44.44.44   VM       Mary Smith
66.66.66.66   printer  ZWV Department

The first table is automatically refreshed every minute. I can't change the database structure or the script that populates it.

Both tables have ip as PRIMARY KEY.

In a view, I would like to display a table like this:

ip           mac               type    owner          Alert
11.11.11.11  48-C0-09-1F-9B-54                        Unauthorized
55.55.55.55  23-CE-D3-B1-39-A6                        Unauthorized
22.22.22.22                    laptop  John Doe       Down
66.66.66.66                    printer ZWV Department Down
33.33.33.33  4E-10-A3-BC-B8-9D server  XYZ Department OK
44.44.44.44  CD-00-60-08-56-2A VM      Mary Smith     OK

How can I model this? Should I make one of the two primary keys a foreign key into the other one?

Once the code is in operation, there will be lots of data, so I want to make sure it's fast enough.

What is the fastest way to retrieve the data?


Update:

I tried using OneToOneField for the second table.

This helps me get records that are in both tables, and the records for unauthorized devices (IPs missing in second table):

ip           mac               type    owner          Alert
11.11.11.11  48-C0-09-1F-9B-54                        Unauthorized
55.55.55.55  23-CE-D3-B1-39-A6                        Unauthorized
33.33.33.33  4E-10-A3-BC-B8-9D server  XYZ Department OK
44.44.44.44  CD-00-60-08-56-2A VM      Mary Smith     OK

but I can't get the devices that are down (IP's missing in first table):

22.22.22.22                    laptop  John Doe       Down
66.66.66.66                    printer ZWV Department Down

I asked for help here, but it seems it can't be done with OneToOneField

Granny Aching
  • 1,295
  • 12
  • 37
  • You need to share more about the data flow if you want help with modelling. Are `ip` and `mac` unique in the first table? On what basis do you associate IPs to devices in the second table? (If anything, I would expect MACs to be associated to devices.) Where do the user data come from? – Endre Both Apr 12 '19 at 18:30
  • `ip` is primary key in both, so it's unique. `mac` doesn't matter; it's just one of the fields. There are other fields... I want to associate the two tables by `ip`. – Granny Aching Apr 12 '19 at 19:00

3 Answers3

6

General idea

You can use qs.union:

  • create 2 models without any relations between them. Don't forget to use class Meta: managed = False
  • select from the first model, annotate with subquery and union with second:
from django.db import models
from django.db.models import F, OuterRef, Subquery, Value
from django.db.models.functions import Coalesce

# OperationalDevice fields: ip, mac
# AllowedDevice fields: ip, type, owner

USE_EMPTY_STR_AS_DEFAULT = True

null_char_field = models.CharField(null=True)
if USE_EMPTY_STR_AS_DEFAULT:
    default_value = ''
else:
    default_value = None

# By default Expressions treat strings as "field_name" so if you want to use
# empty string as a second argument for Coalesce, then you should wrap it in
# `Value()`.
# `None` can be used there without wrapping in `Value()`, but in
# `.annotate(type=NoneValue)` it still should be wrapped, so it's easier to
# just "always wrap".
default_value = Value(default_value, output_field=null_char_field)

operational_devices_subquery = OperationalDevice.objects.filter(ip=OuterRef('ip'))


qs1 = (
    AllowedDevice.objects
    .all()
    .annotate(
        mac=Coalesce(
            Subquery(operational_devices_subquery.values('mac')[:1]),
            default_value,
            output_field=null_char_field,
        ),
    )
)

qs2 = (
    OperationalDevice.objects
    .exclude(
        ip__in=qs1.values('ip'),
    )
    .annotate(
        type=default_value,
        owner=default_value,
    )
)

final_qs = qs1.union(qs2)

Generic approach for multiple fields

A more complex but "universal" approach may use Model._meta.get_fields(). It will be easier to use for cases where "second" model have more that 1 extra field (not only ip,mac). Example code (not tested, but gives general impression):

# One more import:
from django.db.models.fields import NOT_PROVIDED

common_field_name = 'ip'

# OperationalDevice fields: ip, mac, some_more_fields ...
# AllowedDevice fields: ip, type, owner

operational_device_fields = OperationalDevice._meta.get_fields()
operational_device_fields_names = {_f.name for _f in operational_device_fields}  # or set((_f.name for ...))

allowed_device_fields = AllowedDevice._meta.get_fields()
allowed_device_fields_names = {_f.name for _f in allowed_device_fields}  # or set((_f.name for ...))

operational_devices_subquery = OperationalDevice.objects.filter(ip=OuterRef(common_field_name))

left_joined_qs = (  # "Kind-of". Assuming AllowedDevice to be "left" and OperationalDevice to be "right"
    AllowedDevice.objects
    .all()
    .annotate(
        **{
            _f.name: Coalesce(
                Subquery(operational_devices_subquery.values(_f.name)[1]),
                Value(_f.get_default()),  # Use defaults from model definition
                output_field=_f,
            )
            for _f in operational_device_fields
            if _f.name not in allowed_device_fields_names
            # NOTE: if fields other than `ip` "overlap", then you might consider
            # changing logic here. Current implementation keeps fields from the
            # AllowedDevice
        }
        # Unpacked dict is partially equivalent to this:
        # mac=Coalesce(
        #     Subquery(operational_devices_subquery.values('mac')[:1]),
        #     default_for_mac_eg_fallback_text_value,
        #     output_field=null_char_field,
        # ),
        # other_field = Coalesce(...),
        # ...
    )
)

lonely_right_rows_qs = (
    OperationalDevice.objects
    .exclude(
        ip__in=AllowedDevice.objects.all().values(common_field_name),
    )
    .annotate(
        **{
            _f.name: Value(_f.get_default(), output_field=_f),  # Use defaults from model definition
            for _f in allowed_device_fields
            if _f.name not in operational_device_fields_names
            # NOTE: See previous NOTE
        }
    )
)

final_qs = left_joined_qs.union(lonely_right_rows_qs)

Using OneToOneField for "better" SQL

Theoretically you can use device_info = models.OneToOneField(OperationalDevice, db_column='ip', primary_key=True, related_name='status_info'): in AllowedDevice. In this case your first QS may be defined without use of Subquery:

from django.db.models import F

# Now 'ip' is not in field names ('device_info' is there), so add it:
allowed_device_fields_names.add(common_field_name)

# NOTE: I think this approach will result in a more compact SQL query without 
# multiple `(SELECT "some_field" FROM device_info_table ... ) as "some-field"`.
# This also might result in better query performance.
honest_join_qs = (
    AllowedDevice.objects
    .all()
    .annotate(
        **{
            _f.name: F(f'device_info__{_f.name}')
            for _f in operational_device_fields
            if _f.name not in allowed_device_fields_names
        }
    )
)

final_qs = honest_join_qs.union(lonely_right_rows_qs)
# or:
# final_qs = honest_join_qs.union(
#     OperationalDevice.objects.filter(status_info__isnull=True).annotate(**missing_fields_annotation)
# )
# I'm not sure which approach is better performance-wise...
# Commented one will use something like:
# `SELECT ... FROM "device_info_table" LEFT OUTER JOIN "status_info_table" ON ("device_info_table"."ip" = "status_info_table"."ip") WHERE "status_info_table"."ip" IS NULL
#
# So it might be a little better than first with `union(QS.exclude(ip__in=honest_join_qs.values('ip'))`.
# Because later uses SQL like this:
# `SELECT ... FROM "device_info_table" WHERE NOT ip IN (SELECT ip FROM "status_info_table")`
#
# But it's better to measure timings of both approaches to be sure.
# @GrannyAching, can you compare them and tell in the comments which one is better ?

P.S. To automate models definition you can use manage.py inspectdb

P.P.S. Maybe multi-table inheritance with custom OneToOneField(..., parent_link=True) may be more helpful for you than using union.

imposeren
  • 4,142
  • 1
  • 19
  • 27
  • Question - both tables have multiple fields. For example, the table with macs has fqdn (dfds.company.com), open ports, ... Do I do multiple `coalesce` calls, one for each field? – Granny Aching May 06 '19 at 15:04
  • I was able to tweak your code using multiple separate `coalesce`/`subquery` calls, one for each field, and it's kind-of working. I'm still tweaking it. It feels ugly to have 4 separate subqueries. – Granny Aching May 08 '19 at 14:52
  • Yes, when it comes to multiple fields from same query, then django ORM is a bit ugly: currently there is no good built-in way for this – imposeren May 08 '19 at 14:59
  • It took me forever to figure out how to use a default value in Coalescence instead of `None`... – Granny Aching May 08 '19 at 17:51
  • Ooop. sorry about that. Will add a comment in the answer – imposeren May 10 '19 at 21:13
  • It also might be useful to utilize `Model._meta.get_fields`... Added example of it in the answer – imposeren May 10 '19 at 21:58
  • @GrannyAching, I've also added more detailed example of usage together with OneToOneField: This approach might be better performance-wise, because final SQL query will use less subqueries like `SELECT ... WHERE NOT ip IN (SELECT ip from other_table )`, or `SELECT bar, (SELECT foo FROM other_table WHERE ...) AS foo FROM ...` – imposeren May 10 '19 at 22:58
5

Since ip is primary key in both an the first table is getting updated frequently, I suggest updating the second table and converting the ip in the second table to have ip of the first table as a OneToOneField.

This is how your models should look like:

class ModelA(models.Model):
    ip = models.GenericIPAddressField(unique=True)
    mac = models.CharField(max_length=17, null=True, blank=True)

class ModelB(models.Model):
    ip = models.OneToOneField(ModelA)
    type = models.CharField()
    owner = models.CharField()

docs

You can also have the one to one relation using a separate column:

class ModelB(models.Model):
    ip = models.GenericIPAddressField(unique=True) 
    type = models.CharField()
    owner = models.CharField()
    modelA = models.OneToOneField(ModelA)

So now you can have the ip address as the primary key, and you can still refer to the table ModelA using the field modelA.

Aman Garg
  • 2,507
  • 1
  • 11
  • 21
  • How can I use these models to get all the records of ModelB that have no matching ModelA record? – Granny Aching Apr 29 '19 at 13:34
  • `ModelB.objects.filter(ip__isnull=True)` – Aman Garg Apr 30 '19 at 04:47
  • No. This generates query `... WHERE tableB.ip IS NULL` – Granny Aching May 01 '19 at 17:37
  • `IP's missing in first table`.. when the `modelA` gets updated every minute, do you delete entries from the table? – Aman Garg May 01 '19 at 19:16
  • Since both ip are primary key, you can create the relation using an additional column – Aman Garg May 01 '19 at 19:21
  • `...do you delete entries from the table` - One table shows devices that are in the network - this table is automatically refreshed every minute. Table is zap-ed and recreated. The other table lists devices that are allowed to be in the network. It gets changed when a new device is acquired, for example. Discrepancies are if someone tries to sneak in an unauthorized device, or if an authorized device is down. – Granny Aching May 01 '19 at 19:32
  • `...create the relation using an additional column` - can you elaborate, please? – Granny Aching May 01 '19 at 19:33
  • see the updated answer for additional column. When the row in `modelA` is created, the corresponding row in `modelB` needs to have a relation with `modelA`, when a row is deleted, the relation needs to be set to null. – Aman Garg May 02 '19 at 05:21
  • But that is not the case in my database. Each of the two tables has rows that don't have a corresponding row in the other table. See the example in my question. – Granny Aching May 06 '19 at 15:08
3

Once you have a value from one of both tables just do a query into the other one, looking for id. Since these two tables are separated you must do an extra query. You don't need to create an explicit relation, since you are looking into its "id/ip". So once you have a first value, named 'first_object', just look for its relative into the other table.

other_columns = ModelB.objects.get(id=first_object.id)

Then if you want just 'add' the desired columns to the other model and sent a single object to whatever you want:

first_object.attr1 = other_columns.attr1
...
Lu Chinke
  • 622
  • 2
  • 8
  • 27