8

In Django, How do I deal with concurrent changes to the Images associated with a Post object?

This is a flavour of question that has been asked before, but not quite covering the same issues. I've read through these (question, question, question, and question) but the issue is slightly different.

I have a blog post model (pseudocode for speed), which contains title, abstract and body, and associated Images.

class Post(models.Model):
    title = CharField
    abstract = TextField
    body = TextField

class Image(models.Model):
    post = ForeignKey(Post)
    imagefile = ImageField

Now, what I want to add is the ability to store histories of the changes to this Post model. I've thought of two possibilities for this:

Possibility 1

class PostHistory(models.Model):
    post = ForeignKey(Post)
    title_delta = TextField
    abstract_delta = TextField
    body_delta = TextField

However this has the issue that it is storing deltas for no changes (for example when title does not change and there is only a delta for the body field. That said, when more than one field changes, it fits that '1 revision == 1 complete revision'.

Possibility 2

class PostRevision(models.Model):
    post = ForeignKey(Post)
    field = CharField #Field name
    delta = TextField

Through two different approaches, this successfully gives me a history of diffs for the field, which I would generate using diff-match-patch (slightly more performant than the inbuilt difflib). The two issues I now have are related to the generation of master objects (i.e. the top revision in the chain).

The question being asked is: How do I deal with concurrent changes to the Images associated with a Post object? These would be changed via references within the body field of the Post model (this is a Markdown formatted text field which is then edited on POST of the form to add in the URL references for the image field). Is the best way to deal with this to use an M2M field on the revision, and on the Post object, allowing the images to be always stored with the PostRevision object?

culix
  • 10,188
  • 6
  • 36
  • 52
jvc26
  • 6,363
  • 6
  • 46
  • 75
  • I read badly or missunderstood, but did you want to save each fields revision or the object as a whole? – Rickard Zachrisson Jan 11 '13 at 11:31
  • Well, this is part of the dilemma, do I save diffs for the entire object each time (possibility 1) (for many of the fields this will essentially be 'nil changed', or to save the diffs for the individual fields which change (possibility 2)? – jvc26 Jan 11 '13 at 11:47
  • If a revision has the same title as another revision, would that be bad? – Rickard Zachrisson Jan 11 '13 at 11:51
  • No, it is a perfectly acceptable situation. Assuming that I'd opt for Possibility 1, then that clears out the issues of matching deltas, fields and objects and keeps it quite atomic - one revision is one revision. The subsequent issue (tying in the image fields), I would probably solve that with M2Ms between the images and the revision objects, does that sound sensible? – jvc26 Jan 11 '13 at 11:57
  • There is an application to do that: https://django-simple-history.readthedocs.io/en/2.7.2/quick_start.html – dhill May 09 '19 at 09:51

2 Answers2

12

I agree with @rickard-zachrisson that you should stick to approach #1. I'd make a few subtle changes though (pseudo code btw):

class AbstractPost(models.Model):
    title = CharField
    abstract = TextField
    body = TextField

    class Meta:
        abstract = True


class Post(AbstractPost):
    def save(self):
        post = super(Post, self).save()

        PostHistory.objects.create(
            post=post,
            title=post.title,
            abstract=post.abstract,
            body=post.body,
        )


class PostHistory(AbstractPost):
    post = ForeignKey(Post)

    class Meta:
        ordering = ['-pk']


class Image(models.Model):
    post = ForeignKey(Post)
    imagefile = ImageField

Your latest version will always be in Post and your change history is in pk order in PostHistory which is easy to diff against for changes. I'd duplicate the data because storage is cheap and storing deltas is a pita. If you have multiple edits or want to compare the current version to the original version then deltas are basically useless. Any model changes in AbstractPost are reflected in both Post and PostHistory.

Image is keyed to Post so things stay tidy. You can optionally clean up images in your Post.save() function but I'd probably opt for a post_save signal to keep the code cleaner.

Jeff Triplett
  • 2,216
  • 1
  • 16
  • 8
  • Thanks for the above - Regarding duplicating the data rather than diffs, would you still say this when the text can be several thousand words long, and some changes may be as small as changing a full stop to a comma? – jvc26 Jan 11 '13 at 17:34
  • 1
    I would not pre-optimize until you have more data or storage space is a concern (perhaps compress the data then). It comes down to how frequent will you compare versions to another. If your diffs track changes from version to the next then you have quite a bit of data to read / process just to compare two versions. If you just store diffs off the original then you lose more storage space as your diffs get big. It's trade-off either way. – Jeff Triplett Jan 18 '13 at 14:56
  • 2
    `super(Post, self).save()` will not return `post` instance. – Anshul Tiwari Jun 13 '21 at 17:55
1

I think you should stick with option 1.

An idea would be to have an automated revision system. Here is how I would do and mind some syntax errors, im typing out of my head

class A(models.Model):
    field1 = ...
    field2 = ...

    def save():
        if bla_bla_updated:
            A_revisions.objects.create(
                         field1=self.fields1, field2=self.fields2,
                         a=self)
        super(A, self).save()

class A_revision(models.Model):
    field1 = ...
    field2 = ...
    a = models.ForeignKey(A)
    revision = models.IntegerField()

    def save():
        self.revision = (A_revision.objects.get(a=self.a)
                                    .order_by('id').revision) + 1
        super(A_revision, self).save()