How to determine the sequence order of SQL inserts within multiple foreign keys?

Question

I recently started with Django and haven't stopped enjoying python/Django yet, but I'm currently struggling with a logical problem.

Situation (simplified):

class A(models.Model):
    foo = models.CharField(max_length=255)

class B(models.Model):
    bar = models.CharField(max_length=255)
    foo =  models.ForeignKey(A)

class C(models.Model):
    title = models.CharField(max_length=255)
    bar =  models.ForeignKey(B)

class D(models.Model):
    name = models.CharField(max_length=255)
    title =  models.ForeignKey(C)
    bar =  models.ForeignKey(B)

(The real use case consists of hundreds of these classes, yes it's a mess, it clearly proofs a bad database design, but I can't change anything about that)

I've created dynamic ModelForms on every class. The general purpose is to retrieve an excel file and inserting them into the right ModelForms within field validations etc. Every excel file has multiple sheets mapping to the Classes, the first row (header) describes the modelfields and all other rows represent the data.

The data comes completely unsorted, so usually the insert order without breaking the foreign key sequence would be A => B => C => D. But in this case the whole sequence could be like D => B => C => A. The problems strikes when I validate the first sheet D which doesn't validate because the related foreign key hasn't been defined yet.

The question is, how can I add all data and verify the referential integrity afterwards?

Thanks in advance!

Thanks for your help!

Actually all primary keys are derived from the root model, which holds the mapping table for all child tables. I didn't mention it in the first post as I wanted to keep the situation simple. Having said that, I can't change that (mess!) nor can I redesign the classes as they map to any existing (messy!) database. And to make this mess complete, every field is set to "not Null".

My second idea was to initially fill a mapping table (no real idea how to do that yet), and sort the incoming data by this. Sounds like monkey work, it's dirty and I don't like this idea myself, I hoped there were smarter ways.

Do you have any hints on any mathematical solutions to this problem? It's like spanning a tree on arbitrary data.

UPDATE:

I made two functions to solve this, haven't tested the error handling yet.

validate_tables: Looks for all tables related to the given app and saves a nested list (self.found_fields) in a dict (child: [parent, parent, (...)]).

gen_sequence: Writes into a list (self.sequence) with the right sequence mapping to the object_names.

Approvements welcome!

This is my current solution (snippet to get the idea)

    def validate_tables(self):
        app = get_app("testdata")
        self.sequence = []
        self.found_fields = {}
        for model in get_models(app):
            hits = []
            for local_field in model._meta.local_fields:
                if isinstance(local_field, models.ForeignKey):
                    hits.append(local_field.related.parent_model._meta.object_name)
            self.found_fields.update({model._meta.object_name: hits})
        if self.gen_sequence():
            return True
        else:
            raise self.sequence_errors


    def gen_sequence(self, unresolved=None):

        if unresolved:
            self.found_fields = unresolved
            unresolved = {}
        else:
            unresolved = {}

        for model in self.found_fields:
            if ((all(parent in self.sequence for parent in self.found_fields[model]) 
                 and self.sequence)
                or not self.found_fields[model]):
                self.sequence.append(model)
            else:
                unresolved.update({model: self.found_fields[model]})

        if unresolved == self.found_fields:
            self.sequence_errors = unresolved
            return False
        elif not unresolved:
            return self.gen_sequence
        else:
            return self.gen_sequence(unresolved)

Is this a one time import? Or will you be doing this on a regular basis? — Rob Osborne, Jan 15 '13 at 21:02
Ouch. Yeah, I would think about either doing what I suggested to or have looser connection then foreign keys (ie. always lookup the values). Also, in a pinch, you could use the giant lookup table in the form of model inheritance, establish the connections using the base object and then instantiate the specifics later https://docs.djangoproject.com/en/dev/topics/db/models/#model-inheritance — Rob Osborne, Jan 15 '13 at 21:30

score 0 · Answer 1 · edited May 23 '17 at 12:18

You will need to define your own primary keys, which I presume you have a suitable field or this problem would not occur and also allow the ForeignKey to be null. The hard part will be establishing referential integrity later, which is difficult but seemingly not impossible to do in Django.

Instead I would have two fields, one your virtual primary key and make your current foreign keys nullable:

class A(models.Model):
    foo = models.CharField(max_length=255)

class B(models.Model):
    bar = models.CharField(max_length=255)
    foo =  models.ForeignKey(A, null=True)
    foo_key =  models.CharField()

Then, after data import find all 'B' objects with a foo_key, establish the relationship and set the foo_key to null.

This is the mechanism I used when importing a large amount of data from a former GAE project to a PostgreSQL database.

How to determine the sequence order of SQL inserts within multiple foreign keys?

1 Answers1