4

In developing a website for indexing system documentation I've come across a tough nut to crack regarding data "matching"/relations across databases in Django.

A simplified model for my local database:

from django.db import models

class Document(models.Model):
    name = models.CharField(max_length=200)
    system_id = models.IntegerField()
    ...

Imagined model, system details are stored in a remote database.

from django.db import models               

class System(models.Model):     
    name = models.CharField(max_length=200)           
    system_id = models.IntegerField()      
    ...

The idea is that when creating a new Document entry at my website the ID of the related system is to be stored in the local database. When presenting the data I would have to use the stored ID to retrieve the system name among other details from the remote database.

I've looked into foreign keys across databases, but this seems to be very extensive and I'm not sure if I want relations. Rather I visualize a function inside the Document model/class which is able to retrieve the matching data, for example by importing a custom router/function.

How would I go about solving this?


Note that I won't be able to alter anything on the remote database, and it's read-only. Not sure if I should create a model for System aswell. Both databases use PostgreSQL, however my impression is that it's not really of relevance to this scenario which database is used.

Community
  • 1
  • 1
timss
  • 9,982
  • 4
  • 34
  • 56

3 Answers3

4

In the django documentation multi-db (manually-selecting-a-database)

# This will run on the 'default' database.
Author.objects.all()

# So will this.
Author.objects.using('default').all()

# This will run on the 'other' database.
Author.objects.using('other').all()

The 'default' and 'other' are aliases for you databases.
In your case it would could be 'default' and 'remote'.

of course you could replace the .all() with anything you want.

Example: System.objects.using('remote').get(id=123456)
Eagllus
  • 437
  • 3
  • 8
2

You are correct that foreign keys across databases are a problem in Django ORM, and to some extent at the db level too.

You already have the answer basically: "I visualize a function inside the Document model/class which is able to retrieve the matching data"

I'd do it like this:

class RemoteObject(object):
    def __init__(self, remote_model, remote_db, field_name):
        # assumes remote db is defined in Django settings and has an
        # associated Django model definition:
        self.remote_model = remote_model
        self.remote_db = remote_db
        # name of id field on model (real db field):
        self.field_name = field_name
        # we will cache the retrieved remote model on the instance
        # the same way that Django does with foreign key fields:
        self.cache_name = '_{}_cache'.format(field_name)

    def __get__(self, instance, cls):
        try:
            rel_obj = getattr(instance, self.cache_name)
        except AttributeError:
            system_id = getattr(instance, self.field_name)
            remote_qs = self.remote_model.objects.using(self.remote_db)
            try:
                rel_obj = remote_qs.get(id=system_id)
            except self.remote_model.DoesNotExist:
                rel_obj = None
            setattr(instance, self.cache_name, rel_obj)
        if rel_obj is None:
            raise self.related.model.DoesNotExist
        else:
            return rel_obj

    def __set__(self, instance, value):
        setattr(instance, self.field_name, value.id)
        setattr(instance, self.cache_name, value)


class Document(models.Model:
    name = models.CharField(max_length=200)
    system_id = models.IntegerField()
    system = RemoteObject(System, 'system_db_name', 'system_id')

You may recognise that the RemoteObject class above implements Python's descriptor protocol, see here for more info:
https://docs.python.org/2/howto/descriptor.html

Example usage:

>>> doc = Document.objects.get(pk=1)
>>> print doc.system_id
3
>>> print doc.system.id
3
>>> print doc.system.name
'my system'
>>> other_system = System.objects.using('system_db_name').get(pk=5)
>>> doc.system = other_system
>>> print doc.system_id
5

Going further you could write a custom db router:
https://docs.djangoproject.com/en/dev/topics/db/multi-db/#using-routers

This would let you eliminate the using('system_db_name') calls in the code by routing all reads for System model to the appropriate db.

Anentropic
  • 32,188
  • 12
  • 99
  • 147
1

I'd go for a method get_system(). So:

class Document:
    def get_system(self):
       return System.objects.using('remote').get(system_id=self.system_id)

This is the simplest solution. A possible solution is also to use PostgreSQL's foreign data wrapper feature. By using FDW you can abstract away the multidb handling from django and do it inside the database - now you can use queries that need to use the document -> system relation.

Finally, if your use case allows it, just copying the system data periodically to the local db can be a good solution.

akaariai
  • 714
  • 3
  • 6