The subject you're interested in, in the words of Fowler's P of EAA, is called "Datasource Architectural Patterns". So, what is a datasource layer, and what are the common architectural patterns for it?
Datasource Layer
I would like to use some kind of a database layer in the app. Most of
the applications use ORM and it is possible to build complex queries
there. What if I don’t want to use a query builder and prefer
encapsulating it in a function or in a class
Broadly speaking, as you might already know, you're looking to implementing a Data Access Layer (DAL) which the rest of your application will interface with; serving as a layer of indirection that not only helps to decouple from the infrastructure (specific database or ORM framework), but also helps to remove duplication. It’s also called "Datasource Layer".
Datasource Architectural Patterns
Implementation-wise, such a layer consists of a set of objects, or as you've mentioned, functions. Let's first talk functions.
"Layering can occur at any of these levels. A small program may just put separate functions for the layers into different files. A larger
system may have layers corresponding to namespaces with many classes
in each." — PresentationDomainDataLayering, Fowler
How would you call this approach/pattern? Where would you put the
code?
In your specific example, where a function is used to wrap the AR object's chain of method calls, the closest pattern I can think of is the "Transaction Script". Not a datasource-specific pattern, but a generic business logic encapsulation approach; one which is neither functional, nor object-oriented; rather a procedural one.
Such functions can probably be organized in a module named after your data encapsulation architectural layer; however you'd like to call it. Naming candidates could range from datasource.py
to dal.py
and other subjectively good names. Since this pattern is not usual within OO codebases, I don't think you can easily find a standard community-accepted kind of name for it.
The pattern's simplicity might be working for small codebases. However, the chances of eventual maintenance & decision-making problems in the case of large data-intensive codebases that deal with many queries are relatively higher than more modern & acceptable patterns out there.
Also, I personally rather keep things in a consistent style. If Django comes with OO models, I would keep things OO. Such orphan non-util functions in a OO setting would be unfamiliar to most OO programmers that will onboard your project in the future.
Let's talk objects now. In OO codebases, such objects form an on-demand "virtual object database" in the memory. These objects are often implemented following datasource architectural patterns belonging to both the "Object Relational Mapping" and the more generic "DAO" pattern (simply: separate data access concern) categories. Let's try to scratch their surface.
They took all communities by storm. You know all about it. So, nothing to say here.
The only little note I'd like to further build upon is that as you've experienced; they only buy us "some decoupling" as their usually-expressive FluentInterface almost always needs a "home" to reside so that we can avoid duplications and increase maintainability.
A home that also would host any rare performance-sensitive query building logics that might not fit well with standard ORMs. Such a home will help us draw a clear data access boundary.
For the more curious, see:
Flask SQLAlchemy Data Mapper vs Active Record Pattern
Does the term Repository fit here?
It very well does A repository object per Django model could serve us as that "home".
A set of repositories acts as a layer of indirection that mediates between the domain layer and the datasource logic. It's usually the front facade of your DAL that mainly implements the DAO pattern and "wraps over" underlying ORM patterns (like AR or DM) or sometimes even vanilla traditional queries.
"A system with a complex domain model often benefits from a layer,
such as the one provided by Data Mapper (165), that isolates domain
objects from details of the database access code. In such systems it
can be worthwhile to build another layer of abstraction over the
mapping layer where query construction code is concentrated. This
becomes more important when there are a large number of domain classes
or heavy querying. In these cases particularly, adding this layer
helps minimize duplicate query logic. A Repository mediates between
the domain and data mapping layers, acting like an in-memory domain
object collection." — PoEAA, Fowler
In languages with native interface support each repository implementation is usually named after its source (e.g. UserDjangoRepository
or UserLdapRepository
or UserMongoRepository
) and resides behind a generic UserRepositoryInterface
that the rest of the application consumes (Interface resides in the domain layer, while the concretes are pushed to the infrastructure layer).
Just like a "Table Data Gateway" object we're going to explore below, a Repository is also a representation of multiple records. The difference is that repositories tend to be a more generic, collection-like abstraction over the data having no notion of database operations (e.g. latest()
vs. order_by_date_desc()
).
Implementation
## app/repositories.py
from app.models import Post
class PostRepository(AbstractRepository):
@staticmethod
def recent_in_category(category_id):
return Post.objects.filter(...)
# the optional parent abstract can expose generic repo methods like count()
## client code
from app.repositories import PostRepository
def get(id):
return list(
PostRepository.recent_in_category(id)
)
In a fanatically loyal implementation of the pattern, the repository methods should be fed with "criteria" objects rather than primitive values. Django Q class is probably a good candidate for creating such criteria objects.
However, if I'm trying to push the specific Django ORM impl. details under the hood of a repository, I wouldn't leak & couple my repository impl. with a Django-specific concept like Q
that the client code needs to know about. I'd simply pass vanilla data structures.
Downsides
In huge data-intensive applications, repositories can easily move toward being hardly-maintanable God objects; each method of which representing one named query against the target datasource. Imagine a class with 42 different methods. It's not scalable.
Bogard has a few good cases against repositories usually in favor of Query Objects.
This is an object that encapsulates all the legit operations over a database table as a Gateway impl. One instance of such object represents the whole table and all legit operations against it.
It's that objects
object in your Post.objects.filter(...)
example. Broadly (and imo badly) named as a "Model Manager" in Django that resides "below" the models in contrast with "on-top repositories". Building upon this existing Django pattern is probably a good candidate if we want to look at the problem as a Django-specific one.
Implementation
We implement custom methods for a custom manager and will make sure that the model comes with our custom manager rather than the default impl. from django.
## app/models.py
from django.db import models
class PostManager(models.Manager):
def recent_in_category(self, category_id):
return self.filter(
category_id=category_id,
deleted_at__isnull=True,
published_at__gte=yesterday
)
# other repository methods follow...
class Post(models.Model):
title = models.CharField(maxlength=50)
# other fields declarations follow...
# set the custom model manager instance
# or even better: give it a custom name that reflects the pattern
objects = PostManager()
# custom model methods follow...
## client code
from app.models import Post
def get(category_id):
return list(
Post.objects.recent_in_category(category_id)
)
Despite tiny differences, it'd be an imprecise but fair enough statement that it could be considered a "Django-specific repository pattern impl." as well. If so, one could name the manager so that it reflects the pattern:
## client code
# without criterias:
Post.repository.recent_in_category(cid)
# or with criteria:
Post.repository.recent(Q(category_id=cid))
Obviously, it'd only work for codebases where being decoupled from the framework is not a criterion.
Imagine each database query being represented by its own object. It's a popular pattern, especially in the anti-repository camp out there.
It's basically repository pattern, but instead of each query being a repository method, it's a self-containing independent class. So, better maintainability in large codebases.
## app/queries.py
from app.models import Post
# AbstractQuery simply enforces derived classes to implement execute()
class RecentInCategoryPostsQuery(AbstractQuery):
@staticmethod
def execute(category_id):
return Post.objects.filter(...)
# could also accept paging params or criteria objects maybe.
# or it could be non-static & accept the category_id as a constructor
# param.
## client code
from app.queries import RecentInCategoryPostsQuery
def get(category_id):
return HttpResponse(list(
RecentInCategoryPostsQuery.execute(category_id)
))
Other patterns
Other data source architectural patterns as cataloged by Fowler are Row Data Gateway (a domainlogic-less AR), Table Module (a domainlogic-ful TDG) & Unit of Work (atomic transactions). They are only contextually related to your problem, so let's skip 'em.
Further reading
Fowler's "Pattern of Enterprise Application Architecture", while now old, it's still a life-changer classic of the genre that would put any developer on the path of being an architect. I can only recommend it. Don't be afraid of jumping over the old rusty patterns.
A lighter newer and more Python/Django-specific read would be "Architecture Patterns with Python". I only skimmed through it; it looks like a good one though.
Conclusion
some DRY helpers, you know, to avoid searching the whole repository in
order to find all similar usecases if I need to change the behavior
slightly
You know, architecture is usually about making good trade-offs considering the desired outputs and the situation. If the goal here is to develop some dry helpers as you've put it, using the "Table Data Gateway" and extending each model's default django "model manager" sounds like a good consistent OO and django-friendly approach.
In a more framework-agnostic project, I would personally go for "repository" if the it's not a data-intensive application and otherwise, the "query object" which scales much better.