7

I have a model called Document, and I want to add a new table, DocumentCluster that sits above it, with a foreign key to Document.

class DocumentCluster(models.Model):
    sub_document = models.ForeignKey(Document)
    ...lots of fields here...

When I add this table using South, I need to fill it in by setting the primary key and the foreign key to the same value.

For example, if I currently have a Document object with a pk of 12, the new DocumentCluster object will have a pk of 12 and a foreign key to Document number 12.

While it may seem strange that we need the DocumentCluster pk values to match the foreign key values there is an important reason. We use the Document pk in our URLs, but after the change the URLs will load a DocumentCluster, not a Document, so we'll need the pk in DocumentCluster to be the same as it was in Document.

Once that's done, I want the PK of the DocumentCluster to be an AutoField, incrementing from the highest value that was migrated.

Can this be done?

nmgeek
  • 2,127
  • 1
  • 23
  • 31
mlissner
  • 17,359
  • 18
  • 106
  • 169
  • 1
    Why do you need to set the pk? As a natural key it should be unrelated to the content. You have the fk to determine the relationship. – Daniel Roseman Apr 24 '15 at 19:46
  • We use the PK in our URLs, but after the change the URLs will load a `DocumentCluster`, not a `Document`, so we'll need the PK to be the same as it was previously. – mlissner Apr 24 '15 at 20:01

2 Answers2

1

You cannot have a FK from DocumentCluster.pk to Document and at the same time make DocumentCluster.pk a serial column. That's a contradiction. It would have to be two separate columns. I suppose you mean the other way round: from Document to DocumentCluster.

This can be done with SQL DDL commands:

BEGIN;

CREATE TABLE DocumentCluster (pk serial, lots text, of_fields int, ...);

INSERT INTO DocumentCluster (pk, lots, of_fields, ...)
SELECT pk, lots, of_fields, ... FROM Document
ORDER  BY pk; -- optional

ALTER TABLE DocumentCluster ADD CONSTRAINT DocumentCluster_pkey
PRIMARY KEY (pk);  -- could be integrated in CREATE TABLE

ALTER TABLE Document  -- and not the other way round!
ADD   CONSTRAINT pk_fk FOREIGN KEY (pk) REFERENCES DocumentCluster(pk);

SELECT setval(pg_get_serial_sequence('DocumentCluster', 'pk'), max(pk))
FROM   DocumentCluster;  -- update SEQUENCE to highest value

COMMIT;
  • Your obfuscation layer (Django) may be using double-quoted names like "Document", in which case you'd have to do the same ...

  • About setval().

  • About pg_get_serial_sequence()

  • The PK could be declared in the CREATE TABLE statement, but it's cheaper to add it after rows have been inserted - assuming pk is unique not null.

More explanation and links:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I think your SQL is correct but there is no reason mlissner cannot have both a serial pk and the sub_document foreign key in the DocumentCluster model. And I'm not sure what you are referring to by 'Obfuscation layer'. The SQL would be more clear if you added in the sub_document field/column. – nmgeek May 05 '15 at 17:14
  • @nmgeek: Like I wrote: it would have to be two separate columns - like you suggest, too. In theory the referencing column can be a `serial`, too. But it hardly makes sense to draw values from a sequence, while they must be a reference to the key in another table. I clarified a bit. I did not add the `sub_document` field because I don't think its needed after I reversed the direction of the FK. – Erwin Brandstetter May 05 '15 at 22:28
  • As I said in the original question, I am really hoping to do this as much as possible with South so I can have all the benefits it brings. If there's a way I can migrate the data using South, then update the pk to be serial, that'd be my preferred solution. – mlissner May 05 '15 at 22:38
1

Using south this is much simpler than I thought it would be.

Use the south schemamigration command as usual to build the schema migration that adds the DocumentCluster model.

Then use the datamigration command to build a skeleton for the next migration:

./manage.py datamigration yourapp populate_clusters

Then fill in the forwards method in the resulting python migration file so it looks like:

def forwards(self, orm):
    max_id = -1
    clusters_added = 0                                                                                                         
    for document in orm.Document.objects.all():
        cluster = orm.DocumentCluster()
        cluster.id = document.id
        cluster.sub_document = document
        cluster.save()
        max_id = max(max_id, document.id)
        clusters_added +=1
    if max_id >= clusters_added:
        orm.DocumentCluster.objects.raw("SELECT "\
           "setval(pg_get_serial_sequence('yourapp_documentcluster',"\
                                         "'id'), %d)" % max_id+1)

(The reverse method in this datamigration would simply delete all the DocumentModel instances.)

If this 'populate_clusters' migration step is run right after the migration step that added the DocumentCluster table then the DocumentCluster table is empty and the serial/autokey counter starts at zero. If you never deleted any Documents then the serial/autokey counter value will end up at the next unused value for Document pk values and you don't even have to bump it up as shown in Erwin's answer.

If, however, you have deleted Document instances or if somehow an id value was skipped then you'll need to bump up that serial/autokey counter using SQL. Django provides a way to directly execute raw SQL. (I don't think you can do without using raw SQL).

For safety sake you should check if cluster.id is <= the maximum document.id value seen in the loop.

nmgeek
  • 2,127
  • 1
  • 23
  • 31
  • This looks like a much better answer for my purposes. I'm surprised Django will let you manually set the value of a primary key, but if that's all it takes, then we're in business! – mlissner May 06 '15 at 00:03
  • While working on something else, I found that Django actually has documentation on manually setting auto primary keys. https://docs.djangoproject.com/en/1.8/ref/models/instances/#explicitly-specifying-auto-primary-key-values – mlissner May 12 '15 at 01:15