14

TLTR: Django does not include database names in SQL queries, can I somehow force it to do this or is there a workaround?

The long version:

I have two legacy MySQL databases (Note: I have no influence on the DB layout) for which I'm creating a readonly API using DRF on Django 1.11 and python 3.6

I'm working around the referential integrity limitation of MyISAM DBs by using the SpanningForeignKey field suggested here: https://stackoverflow.com/a/32078727/7933618

I'm trying to connect a table from DB1 to a table from DB2 via a ManyToMany through table on DB1. That's the query Django is creating:

SELECT "table_b"."id" FROM "table_b" INNER JOIN "throughtable" ON ("table_b"."id" = "throughtable"."b_id") WHERE "throughtable"."b_id" = 12345

Which of course gives me an Error "Table 'DB2.throughtable' doesn't exist" because throughtable is on DB1 and I have no idea how to force Django to prefix the tables with the DB name. The query should be:

SELECT table_b.id FROM DB2.table_b INNER JOIN DB1.throughtable ON (table_b.id = throughtable.b_id) WHERE throughtable.b_id = 12345

Models for app1 db1_app/models.py: (DB1)

class TableA(models.Model):
    id = models.AutoField(primary_key=True)
    # some other fields
    relations = models.ManyToManyField(TableB, through='Throughtable')

class Throughtable(models.Model):
    id = models.AutoField(primary_key=True)
    a_id = models.ForeignKey(TableA, to_field='id')
    b_id = SpanningForeignKey(TableB, db_constraint=False, to_field='id')

Models for app2 db2_app/models.py: (DB2)

class TableB(models.Model):
    id = models.AutoField(primary_key=True)
    # some other fields

Database router:

def db_for_read(self, model, **hints):
    if model._meta.app_label == 'db1_app':
        return 'DB1'

    if model._meta.app_label == 'db2_app':
        return 'DB2'

    return None

Can I force Django to include the database name in the query? Or is there any workaround for this?

hynekcer
  • 14,942
  • 6
  • 61
  • 99
masterfloda
  • 2,908
  • 1
  • 16
  • 27
  • Did you try running such a query from `mysql` client? Did it work? – phd Jul 10 '17 at 17:43
  • Yes, `SELECT table_b.id FROM DB2.table_b INNER JOIN DB1.throughtable ON (table_b.id = throughtable.b_id) WHERE throughtable.b_id = 12345` works. As I said, the problem is that Django doesn't add the DB names to the tables in the query. – masterfloda Jul 10 '17 at 17:56
  • Cross database relationship by Django is a hard problem. Many pull requests or proposals on Django's bug tracker have been rejected for clear reason, mainly because incomplete. (missing other backends, migrations, tests, fear of future issues, sustainability and complexity of core development for minor improvement) I found some SO questions too complex to expect any progress. This question is very realistic thanks to expectations about relationship only to legacy databases. It is a good starting point. I awarded bounty to Art for his PostgreSQL contribution. No answer is perfect yet here. – hynekcer Sep 23 '17 at 00:20

3 Answers3

8

A solution exists for Django 1.6+ (including 1.11) for MySQL and sqlite backends, by option ForeignKey.db_constraint=False and explicit Meta.db_table. If the database name and table name are quoted by ' ` ' (for MySQL) or by ' " ' (for other db), e.g. db_table = '"db2"."table2"'). Then it is not quoted more and the dot is out of quoted. Valid queries are compiled by Django ORM. A better similar solution is db_table = 'db2"."table2' (that allows not only joins but it is also by one issue nearer to cross db constraint migration)

db2_name = settings.DATABASES['db2']['NAME']

class Table1(models.Model):
    fk = models.ForeignKey('Table2', on_delete=models.DO_NOTHING, db_constraint=False)

class Table2(models.Model):
    name = models.CharField(max_length=10)
    ....
    class Meta:    
        db_table = '`%s`.`table2`' % db2_name  # for MySQL
        # db_table = '"db2"."table2"'          # for all other backends
        managed = False

Query set:

>>> qs = Table2.objects.all()
>>> str(qs.query)
'SELECT "DB2"."table2"."id" FROM DB2"."table2"'
>>> qs = Table1.objects.filter(fk__name='B')
>>> str(qs.query)
SELECT "app_table1"."id"
    FROM "app_table1"
    INNER JOIN "db2"."app_table2" ON ( "app_table1"."fk_id" = "db2"."app_table2"."id" )
    WHERE "db2"."app_table2"."b" = 'B'

That query parsing is supported by all db backends in Django, however other necessary steps must be discussed individually by backends. I'm trying to answer more generally because I found a similar important question.

The option 'db_constraint' is necessary for migrations, because Django can not create the reference integrity constraint
ADD foreign key table1(fk_id) REFERENCES db2.table2(id),
but it can be created manually for MySQL.

A question for particular backends is if another database can be connected to the default at run-time and if a cross database foreign key is supported. These models are also writable. The indirectly connected database should be used as a legacy database with managed=False (because only one table django_migrations for migrations tracking is created only in the directly connected database. This table should describe only tables in the same database.) Indexes for foreign keys can however be created automatically on the managed side if the database system supports such indexes.

Sqlite3: It has to be attached to another default sqlite3 database at run-time (answer SQLite - How do you join tables from different databases), at best by the signal connection_created:

from django.db.backends.signals import connection_created

def signal_handler(sender, connection, **kwargs):
    if connection.alias == 'default' and connection.vendor == 'sqlite':
        cur = connection.cursor()
        cur.execute("attach '%s' as db2" % db2_name)
        # cur.execute("PRAGMA foreign_keys = ON")  # optional

connection_created.connect(signal_handler)

Then it doesn't need a database router of course and a normal django...ForeignKey can be used with db_constraint=False. An advantage is that "db_table" is not necessary if the table names are unique between databases.

In MySQL foreign keys between different databases are easy. All commands like SELECT, INSERT, DELETE support any database names without attaching them previously.


This question was about legacy databases. I have however some interesting results also with migrations.

hynekcer
  • 14,942
  • 6
  • 61
  • 99
  • I've seen this trick with quotes used before... Looks amusing, however I only see it as the last resort. Does official Django documentation tells anything about "the dot is protected"? And, why do you need to set `db_constraint=False`? I thought MySQL supports cross-db foreign keys... – Art Sep 18 '17 at 13:45
  • I used `db_constraint=False` in order to can create the table Table1 (Throughtable), especially the field `fk`(`b_id`), automatically by migrations, in order to can use JOIN. Then the field can be eventually modified by SQL and migrations must be disabled also for Table2 by managed=False. My goal was to write a solution supported at least by two backends. I have an example with some if/else conditional code that works with also with migrations for MySQL and sqlite3. I tried to solve django.test.TestCase, but creating the test database and then cross db JOIN by the same run is really a problem. – hynekcer Sep 18 '17 at 17:24
  • You are right about the last resort... before what, before doing nothing? There is also a much better solution with "half" quoting `'db2"."table_b'` (more strange). It is anyway bad that the model depends by "db_table" not only on the backend (' \` ' vs. ` " `) but also on the low level database name (for MySQL) configured in DATABASES. Is it luck or bad luck that it is not a part of migrations? Migrations would be broken on Table1 after removing db_constraint=False. – hynekcer Sep 18 '17 at 17:28
  • The SQL can be easily fixed: `./manage.py sqlmigrate app 0001_initial`: the last line: `ALTER TABLE "app_table1" ADD CONSTRAINT "app_table1_rel_id_3d909b_fk_db2"_"app_table2_id" FOREIGN KEY ("rel_id") REFERENCES "db2"."app_table2" ("id");` (replaced all backticks by doublequotes due to nonproportional formating). If ...db2\`_\`app... is replaced by ...db2_app..., then it works. – hynekcer Sep 18 '17 at 17:45
  • The quoting trick solved my problem. It might not be the perfect solution for all backends, but for my case (legacy readonly MySQL DBs), it's good enough. THANKS! Note: I had to modify the DB router slightly to prefer the DB prefix in `model._meta.db_table` to the `app_label` if specified. – masterfloda Sep 18 '17 at 19:16
  • @hynekcer Instead of hardcoding the DB name, would it be better to simply get it from `settings.DATABASES['db1']['NAME']`? Or should this rather be avoided because it's against the separation of concerns? – masterfloda Sep 18 '17 at 21:13
  • @masterfloda Good idea for MySQL. It is anyway also hardcoded by the alias "db1"/"db2". Sqlite has the alias directly written. A nice source would be some DRY mapping used by both model and router. – hynekcer Sep 18 '17 at 21:33
  • @Art OK. I reworded "the dot is protected" because the obvious ancient intention of code authors was only to simple prevent quotes on a correctly quoted string. That is enough for all queries and generally for CRUD. The new code for migrations disregards it frequently and it adds suffixes in a simple way. – hynekcer Sep 19 '17 at 08:49
7

I have a similar setup with PostgreSQL. Utilizing search_path to make cross-schema references possible in Django (schema in postgres = database in mysql). Unfortunately, seems like MySQL doesn't have such a mechanism.

However, you might try your luck creating views for it. Make views in one databases that references other databases, use it to select data. I think it's the best option since you want your data read-only anyway.

It's however not a perfect solution, executing raw queries might be more useful in some cases.


UPD: Providing mode details about my setup with PostgreSQL (as requested by bounty later). I couldn't find anything like search_path in MySQL documentation.

Quick intro

PostgreSQL has Schemas. They are synonymous to MySQL databases. So if you are MySQL user, imaginatively replace word "schema" with word "database". Requests can join tables between schemas, create foreign keys, etc... Each user (role) has a search_path:

This variable [search_path] specifies the order in which schemas are searched when an object (table, data type, function, etc.) is referenced by a simple name with no schema specified.

Special attention on "no schema specified", because that's exactly what Django does.

Example: Legacy databases

Let's say we got coupe legacy schemas, and since we are not allowed to modify them, we also want one new schema to store the NM relation in it.

  • old1 is the first legacy schema, it has old1_table (which is also the model name, for convenience sake)
  • old2 is the second legacy schema, it has old2_table
  • django_schema is a new one, it will store the required NM relation

All we need to do is:

alter role django_user set search_path = django_schema, old1, old2;

This is it. Yes, that simple. Django has no names of the schemas ("databases") specified anywhere. Django actually has no idea what is going on, everything is managed by PostgreSQL behind the scenes. Since django_schema is first in the list, new tables will be created there. So the following code ->

class Throughtable(models.Model):
    a_id = models.ForeignKey('old1_table', ...)
    b_id = models.ForeignKey('old2_table', ...)

-> will result in a migration that creates table throughtable that references old1_table and old2_table.

Problems: if you happened to have several tables with same names, you will either need to rename them or still trick Django into using a dot inside of table names.

Art
  • 2,235
  • 18
  • 34
  • I found some links [(1)](https://www.amvtek.com/blog/posts/2014/Jun/13/accessing-multiple-postgres-schemas-from-django/), [(2)](https://stackoverflow.com/questions/1160598/how-to-use-schemas-in-django), [(3)](https://stackoverflow.com/questions/41452195/specify-postgresql-schema-in-django) and a great discussion in an obsoleted pull request [#6148](https://code.djangoproject.com/ticket/6148) with links to newer PR. A common denominator is that they solve only access to different schemas in PostgreSQL, but no JOIN or are rejected. – hynekcer Sep 19 '17 at 08:09
  • Can you demonstrate your solution for Postgres? (Is it DRY? What all you need to do if a new normal field is added? Must be created manually by SQL or used in two models? Can you generate a necessary SQL from Django?) (bounty?) – hynekcer Sep 19 '17 at 08:14
  • @hynekcer upd. As for "what if new field / new model" questions - as long as you only perform migrations for tables in the schema listed first in `search_path`, django just makes normal migrations, afterwards postgres resolves what django attempts to reference. No extra actions from my side. – Art Sep 19 '17 at 17:54
3

Django does have the ability to work with multiple databases. See https://docs.djangoproject.com/en/1.11/topics/db/multi-db/.

You can also use raw SQL queries in Django. See https://docs.djangoproject.com/en/1.11/topics/db/sql/.

WahhabB
  • 520
  • 3
  • 8
  • I know about combining multiple databases and was also able to work around the referential integrity limitation of MyISAM. Raw SQL might be an option but I wanted to avoid that if possible. This ManyToMany relation is only one part of a quite complicated combination of DRF ModelSerializers and querysets and it would render them mostly useless. – masterfloda Jul 10 '17 at 20:20