2

A good colleague of mine asked this question a while ago which I'm now publicizing and sharing my own answer for, not only for the sake of future reference, but also to learn from the answers of the community.


I would like to use some kind of a database layer in the app. Most of the applications use ORM and it is possible to build complex queries there. What if I don’t want to use a query builder and prefer encapsulating it in a function or in a class? For example instead of:

def get(category_id: int) -> HttpResponse:
    posts = list(
        Post.objects
            .filter(category_id=category_id)
            .filter(deleted_at__isnull=True)
            .filter(published_at__gte=yesterday)
    )
    return HttpResponse(posts)

I'd like to do something along the lines of:

def recent_posts_from_category(category_id: int) -> List[Post]:
    return list(
        Post.objects
            .filter(category_id=category_id)
            .filter(deleted_at__isnull=True)
            .filter(published_at__gte=yesterday)
    )

def get(category_id: int) -> HttpResponse:
    posts = recent_posts_from_category(category_id)
    return HttpResponse(posts)

How would you call this approach/pattern? Where would you put the code?

Creating a module or a namespace called database sounds too broad. I don’t want to put these functions into utils or helpers namespaces either cause they are clearly not utilities.

Does the term Repository fit here? I would not go that far as to encapsulate everything (reads and writes), using ValueModels instead of ORM models (ActiveRecord) with an abstract goal to be able to replace the ORM if needed, building custom units of work is also out of scope.

But I'm looking for some DRY helpers, you know, to avoid searching the whole repository in order to find all similar use-cases if I need to change the behavior slightly. To avoid duplicating the chain of ORM calls.

sepehr
  • 17,110
  • 7
  • 81
  • 119

1 Answers1

7

The subject you're interested in, in the words of Fowler's P of EAA, is called "Datasource Architectural Patterns". So, what is a datasource layer, and what are the common architectural patterns for it?

Datasource Layer

I would like to use some kind of a database layer in the app. Most of the applications use ORM and it is possible to build complex queries there. What if I don’t want to use a query builder and prefer encapsulating it in a function or in a class

Broadly speaking, as you might already know, you're looking to implementing a Data Access Layer (DAL) which the rest of your application will interface with; serving as a layer of indirection that not only helps to decouple from the infrastructure (specific database or ORM framework), but also helps to remove duplication. It’s also called "Datasource Layer".

Datasource Architectural Patterns

Implementation-wise, such a layer consists of a set of objects, or as you've mentioned, functions. Let's first talk functions.

"Layering can occur at any of these levels. A small program may just put separate functions for the layers into different files. A larger system may have layers corresponding to namespaces with many classes in each." — PresentationDomainDataLayering, Fowler


Transaction Script

How would you call this approach/pattern? Where would you put the code?

In your specific example, where a function is used to wrap the AR object's chain of method calls, the closest pattern I can think of is the "Transaction Script". Not a datasource-specific pattern, but a generic business logic encapsulation approach; one which is neither functional, nor object-oriented; rather a procedural one.

Such functions can probably be organized in a module named after your data encapsulation architectural layer; however you'd like to call it. Naming candidates could range from datasource.py to dal.py and other subjectively good names. Since this pattern is not usual within OO codebases, I don't think you can easily find a standard community-accepted kind of name for it.

The pattern's simplicity might be working for small codebases. However, the chances of eventual maintenance & decision-making problems in the case of large data-intensive codebases that deal with many queries are relatively higher than more modern & acceptable patterns out there.

Also, I personally rather keep things in a consistent style. If Django comes with OO models, I would keep things OO. Such orphan non-util functions in a OO setting would be unfamiliar to most OO programmers that will onboard your project in the future.

Let's talk objects now. In OO codebases, such objects form an on-demand "virtual object database" in the memory. These objects are often implemented following datasource architectural patterns belonging to both the "Object Relational Mapping" and the more generic "DAO" pattern (simply: separate data access concern) categories. Let's try to scratch their surface.


ActiveRecord & DataMapper

They took all communities by storm. You know all about it. So, nothing to say here.

The only little note I'd like to further build upon is that as you've experienced; they only buy us "some decoupling" as their usually-expressive FluentInterface almost always needs a "home" to reside so that we can avoid duplications and increase maintainability.

A home that also would host any rare performance-sensitive query building logics that might not fit well with standard ORMs. Such a home will help us draw a clear data access boundary.

For the more curious, see:
Flask SQLAlchemy Data Mapper vs Active Record Pattern


Repository

Does the term Repository fit here?

It very well does A repository object per Django model could serve us as that "home".

A set of repositories acts as a layer of indirection that mediates between the domain layer and the datasource logic. It's usually the front facade of your DAL that mainly implements the DAO pattern and "wraps over" underlying ORM patterns (like AR or DM) or sometimes even vanilla traditional queries.

"A system with a complex domain model often benefits from a layer, such as the one provided by Data Mapper (165), that isolates domain objects from details of the database access code. In such systems it can be worthwhile to build another layer of abstraction over the mapping layer where query construction code is concentrated. This becomes more important when there are a large number of domain classes or heavy querying. In these cases particularly, adding this layer helps minimize duplicate query logic. A Repository mediates between the domain and data mapping layers, acting like an in-memory domain object collection." — PoEAA, Fowler

In languages with native interface support each repository implementation is usually named after its source (e.g. UserDjangoRepository or UserLdapRepository or UserMongoRepository) and resides behind a generic UserRepositoryInterface that the rest of the application consumes (Interface resides in the domain layer, while the concretes are pushed to the infrastructure layer).

Just like a "Table Data Gateway" object we're going to explore below, a Repository is also a representation of multiple records. The difference is that repositories tend to be a more generic, collection-like abstraction over the data having no notion of database operations (e.g. latest() vs. order_by_date_desc()).

Implementation
## app/repositories.py

from app.models import Post

class PostRepository(AbstractRepository): 
  @staticmethod
  def recent_in_category(category_id):
    return Post.objects.filter(...)

# the optional parent abstract can expose generic repo methods like count()

## client code

from app.repositories import PostRepository

def get(id): 
  return list(
    PostRepository.recent_in_category(id)
  )

In a fanatically loyal implementation of the pattern, the repository methods should be fed with "criteria" objects rather than primitive values. Django Q class is probably a good candidate for creating such criteria objects.

However, if I'm trying to push the specific Django ORM impl. details under the hood of a repository, I wouldn't leak & couple my repository impl. with a Django-specific concept like Q that the client code needs to know about. I'd simply pass vanilla data structures.

Downsides

In huge data-intensive applications, repositories can easily move toward being hardly-maintanable God objects; each method of which representing one named query against the target datasource. Imagine a class with 42 different methods. It's not scalable.

Bogard has a few good cases against repositories usually in favor of Query Objects.


Table Data Gateway

This is an object that encapsulates all the legit operations over a database table as a Gateway impl. One instance of such object represents the whole table and all legit operations against it.

It's that objects object in your Post.objects.filter(...) example. Broadly (and imo badly) named as a "Model Manager" in Django that resides "below" the models in contrast with "on-top repositories". Building upon this existing Django pattern is probably a good candidate if we want to look at the problem as a Django-specific one.

Implementation

We implement custom methods for a custom manager and will make sure that the model comes with our custom manager rather than the default impl. from django.

## app/models.py

from django.db import models

class PostManager(models.Manager):
  def recent_in_category(self, category_id):
    return self.filter(
      category_id=category_id, 
      deleted_at__isnull=True, 
      published_at__gte=yesterday
    )
  # other repository methods follow...


class Post(models.Model):
  title = models.CharField(maxlength=50)
  # other fields declarations follow...

  # set the custom model manager instance
  # or even better: give it a custom name that reflects the pattern
  objects = PostManager() 

  # custom model methods follow...

## client code
from app.models import Post

def get(category_id): 
  return list(
    Post.objects.recent_in_category(category_id)
  )

Despite tiny differences, it'd be an imprecise but fair enough statement that it could be considered a "Django-specific repository pattern impl." as well. If so, one could name the manager so that it reflects the pattern:

## client code

# without criterias:
Post.repository.recent_in_category(cid)

# or with criteria:
Post.repository.recent(Q(category_id=cid))

Obviously, it'd only work for codebases where being decoupled from the framework is not a criterion.


Query Object

Imagine each database query being represented by its own object. It's a popular pattern, especially in the anti-repository camp out there.

It's basically repository pattern, but instead of each query being a repository method, it's a self-containing independent class. So, better maintainability in large codebases.

## app/queries.py

from app.models import Post

# AbstractQuery simply enforces derived classes to implement execute()

class RecentInCategoryPostsQuery(AbstractQuery):
  @staticmethod
  def execute(category_id):     
    return Post.objects.filter(...)
    # could also accept paging params or criteria objects maybe.
   
    # or it could be non-static & accept the category_id as a constructor 
    # param.

## client code

from app.queries import RecentInCategoryPostsQuery

def get(category_id):
  return HttpResponse(list(
    RecentInCategoryPostsQuery.execute(category_id)
  ))

Other patterns

Other data source architectural patterns as cataloged by Fowler are Row Data Gateway (a domainlogic-less AR), Table Module (a domainlogic-ful TDG) & Unit of Work (atomic transactions). They are only contextually related to your problem, so let's skip 'em.


Further reading

Fowler's "Pattern of Enterprise Application Architecture", while now old, it's still a life-changer classic of the genre that would put any developer on the path of being an architect. I can only recommend it. Don't be afraid of jumping over the old rusty patterns.

A lighter newer and more Python/Django-specific read would be "Architecture Patterns with Python". I only skimmed through it; it looks like a good one though.


Conclusion

some DRY helpers, you know, to avoid searching the whole repository in order to find all similar usecases if I need to change the behavior slightly

You know, architecture is usually about making good trade-offs considering the desired outputs and the situation. If the goal here is to develop some dry helpers as you've put it, using the "Table Data Gateway" and extending each model's default django "model manager" sounds like a good consistent OO and django-friendly approach.

In a more framework-agnostic project, I would personally go for "repository" if the it's not a data-intensive application and otherwise, the "query object" which scales much better.

sepehr
  • 17,110
  • 7
  • 81
  • 119