2

In SO question 7531153, I asked the proper way to split a Django model into two—either using Django's Multi-table Inheritance or explicitly defining a OneToOneField.

Based Luke Sneeringer's comment, I'm curious if there's a performance gain from splitting the model in two.

The reason I was thinking about splitting the model in two is because I have some fields that will always be completed, while there are other fields that will typically be empty (until the project is closed).

Are there performance gains from putting typically empty fields, such as actual_completion_date and actual_project_costs, into a separate model/table in Django?

Split into Two Models

class Project(models.Model):
    project_number = models.SlugField(max_length=5, blank=False,
            primary_key=True)
    budgeted_costs = models.DecimalField(max_digits=10, decimal_places=2)
    submitted_on = models.DateField(auto_now_add=True)

class ProjectExtendedInformation(models.Model):
    project = models.OneToOneField(CapExProject, primary_key=True)
    actual_completion_date = models.DateField(blank=True, null=True)
    actual_project_costs = models.DecimalField(max_digits=10, decimal_places=2,
            blank=True, null=True)
Community
  • 1
  • 1
Matthew Rankin
  • 457,139
  • 39
  • 126
  • 163

2 Answers2

7

Actually, quite the opposite. Any time multiple tables are involved, a SQL JOIN will be required, which is inherently slower for a database to perform than a simple SELECT query. The fact that the fields are empty is meaningless in terms of performance one way or another.

Depending on the size of the table and the number of columns, it may be faster to only select a subset of fields that you need to interact with, but that's easy enough in Django with the only method:

Project.objects.only('project_number', 'budgeted_costs', 'submitted_on')

Which produces something akin to:

SELECT ('project_number', 'budgeted_costs', 'submitted_on') FROM yourapp_project;

Using separate models (and tables) only makes sense for the purposes of modularization -- such that you subclass Project to create a specific kind of project that requires additional fields but still needs all the fields of a generic Project.

Chris Pratt
  • 232,153
  • 36
  • 385
  • 444
  • Indeed, the only advantage is structural (as long as it makes sense). For example, if the extended information may be shared between multiple projects, then there are structural advantages to reflect this possibility in the model. A one-to-one field obviously prevents this, though. – André Caron Oct 13 '11 at 22:10
  • 1
    You're recommending a premature optimization, IMO. – culebrón Oct 13 '11 at 22:27
  • What "optimization" did I recommend that is "premature"? – Chris Pratt Oct 14 '11 at 14:22
  • You generalize that JOINs are bad. They are bad in most cases, but sometimes are convenient. – culebrón Oct 14 '11 at 21:32
  • I said nothing of the sort. I said a JOIN is inherently slower than a simple SELECT on one table. And it is. Period, no arguments. How much slower and whether that's a problem is a case by case determination. The OPs question was does it add to performance, though, and the answer to that is an obvious no. It's at the very least negligibly slower -- certainly not faster. – Chris Pratt Oct 14 '11 at 21:38
7

For your case, if there's some info that's available only when it's closed, I'd indeed advise making a separate model.

Joins aren't bad. Especially in your case the join will be faster if you have all rows in one table and much fewer rows in the other one. I've worked with databases a lot, and in most cases it's a pure guess to tell if a join will be better or worse. Even a full table scan is better than using an index in many cases. You need to look at the EXPLAINs, if performance is a concern, and profile the Db work if possible (I know Oracle supports this.) But before performance becomes an issue, I prefer quicker development.

We have a table in Django with 5M rows. And we needed a column that would have been not null only for 1K rows. Just altering the table would have taken half a day. Rebuilding from scratch also takes a few hours. We've chosen to make a separate model.

I've been to a lecture on Domain Driven Design in which the author explained that it is important, especially in development of a new app, to separate models, to not stuff everything in one class.

Let's say you have a CargoAircraft class and PassengerAircraft. It's so tempting to put them in one class and work "seamlessly", isn't it? But interactions with them (scheduling, booking, weight or capacity calculations) are completely different.

So, by putting everything in one class you force yourself to bunch of IF clauses in every method, to extra methods in Manager, to harder debugging, to bigger tables in the DB. Basically you make yourself spend more time developing for the sake of what? For only two things: 1) fewer joins 2) fewer class names.

If you separate the classes, things go much easier:

  • clean code, no ugly ifs, no .getattr and defaults
  • easy debugging
  • more mainainable database

hence, faster development.

djvg
  • 11,722
  • 5
  • 72
  • 103
culebrón
  • 34,265
  • 20
  • 72
  • 110
  • Separation that makes sense is good, obviously. However, the author in this case, is only divvying the models because some of the fields will be blank. Having a second model that is merely "extended information" on the first, is BAD design. – Chris Pratt Oct 14 '11 at 14:21
  • This may be true, but in this case they can quickly grow to bigger models, and I wanted to tell the Matthew that DDD is quite reasonable and it's better to remember it. It was partially a response to you telling that JOINs are bad. – culebrón Oct 14 '11 at 21:35
  • I'll give you benefit of assuming you haven't read my comment on my answer, yet, but let me re-iterate, I never said JOINs are bad. Never even came close to saying that. – Chris Pratt Oct 14 '11 at 21:42
  • +1 for "by putting everything in one class you force yourself to bunch of IF clauses in every method" – arjuncc May 31 '13 at 09:14