Trying to simplify our repository pattern

Question

Currently we have implemented a repository pattern at work. All our repositories sit behind their own interfaces and are mapped via Ninject. Our project is quite large and there are a couple quirks with this pattern I'm trying to solve.

First, there are some controllers where we need upwards of 10 to 15 repositories all in the same controller. The constructor gets rather ugly when asking for so many repositories. The second quirk reveals itself after you call methods on multiple repositories. After doing work with multiple repositories we need to call the SaveChanges method, but which repository should we call it on? Every repository has one. All repositories have the same instance of the Entity Framework data context injected so picking any random repository to call save on will work. It just seems so messy.

I looked up the "Unit Of Work" pattern and came up with a solution that I think solves both problems, but I'm not 100% confident in this solution. I created a class called DataBucket.

// Slimmed down for readability
public class DataBucket
{
    private DataContext _dataContext;

    public IReportsRepository ReportRepository { get; set; }
    public IEmployeeRepository EmployeeRepository { get; set; }
    public IDashboardRepository DashboardRepository { get; set; }

    public DataBucket(DataContext dataContext,
        IReportsRepository reportsRepository,
        IEmployeeRepository employeeRepository,
        IDashboardRepository dashboardRepository)
    {
        _dataContext = dataContext;
        this.ReportRepository = reportsRepository;
        this.EmployeeRepository = employeeRepository;
        this.DashboardRepository = dashboardRepository;
    }

    public void SaveChanges()
    {
        _dataContext.SaveChanges();
    }
}

This appears to solve both issues. There is now only one SaveChanges method on the data bucket itself and you only inject one object, the data bucket. You then access all the repositories as properties. The data bucket would be a little messy looking since it would be accepting ALL (easily 50 or more) of our repositories in its constructor.

The process of adding a new repository would now include: creating the interface, creating the repository, mapping the interface and repository in Ninject, and adding a property to the data bucket and populating it.

I did think of an alternative to this that would eliminate a step from above.

public class DataBucket
{
    private DataContext _dataContext;

    public IReportsRepository ReportRepository { get; set; }
    public IEmployeeRepository EmployeeRepository { get; set; }
    public IDashboardRepository DashboardRepository { get; set; }

    public DataBucket(DataContext dataContext)
    {
        _dataContext = dataContext;
        this.ReportRepository = new ReportsRepository(dataContext);
        this.EmployeeRepository = new EmployeeRepository(dataContext);
        this.DashboardRepository = new DashboardRepository(dataContext);
    }

    public void SaveChanges()
    {
        _dataContext.SaveChanges();
    }
}

This one pretty much eliminates all the repository mappings in Ninject because they are all instantiated in the data bucket. So now the steps to adding a new repository include: Create interface, create repository, add property to data bucket and instantiate.

Can you see any flaws with this model? On the surface it seems much more convenient to consume our repositories in this way. Is this a problem that has been addressed before? If so, what is the most common and/or most efficient approach to this issue?

Don't you need an IDataBucket interface since this class uses the repository-classes, so this class is also injected? That aside, seems like a good solution to me. I think it would be better to make the repository-properties have a private setter. — Maarten, Aug 23 '12 at 17:30
You can inject concrete classes with Ninject. If you provide no mapping then Ninject will look for a class with the name requested and instantiate it if possible. — CatDadCode, Aug 23 '12 at 17:31
For unit purposes though @Maarten you are probably right. I will have to interface it so I can mock it for testing. — CatDadCode, Aug 23 '12 at 17:42

score 6 · Answer 1 · edited May 23 '17 at 11:50

First, there are some controllers where we need upwards of 10 to 15 repositories all in the same controller.

Say hello to Abstract factory pattern. Instead of registering all repositories in Ninject and injecting them to controllers register just single implementation of the factory which will be able to provide any repository you need - you can even create them lazily only if the controller really needs them. Than inject the factory to controller.

Yes it also has some disadvantages - you are giving controller permission to get any repository. Is it problem for you? You can always create multiple factories for some sub systems if you need or simply expose multiple factory interfaces on single implementation. It still doesn't cover all cases but it is better than passing 15 parameters to constructor. Btw. are you sure those controllers should not be split?

Note: This is not Service provider anti-pattern.

After doing work with multiple repositories we need to call the SaveChanges method, but which repository should we call it on?

Say hello to Unit of Work pattern. Unit of Work is logical transaction in your application. It persists all changes from logical transaction together. Repository should not be responsible for persisting changes - the unit of work should be. Somebody mentioned that DbContext is implementation of Repository pattern. It is not. It is implementation of Unit of Work pattern and DbSet is implementation of Repository pattern.

What you need is central class holding the instance of the context. The context will be also passed to repositories because they need it to retrieve data but only the central class (unit of work) will offer saving changes. It can also handle database transaction if you for example need to change isolation level.

Where should be unit of work handled? That depends where your logical operation is orchestrated. If the operation is orchestrated directly in controller's actions you need to have unit of work in the action as well and call SaveChanges once all modifications are done.

If you don't care about separation of concerns too much you can even combine unit of work and factory into single class. That brings us to your DataBucket.

Your idea sounds similar to my revised data bucket class where I only instantiate repositories as they are needed. https://gist.github.com/3440615 What do you think of that? Or do you have a different idea for how this factory should "create" repositories? Would this factory have a `CreateReportsRepository` method? If so then it would have just as many methods as my bucket has properties. Or are you thinking abstract it into one method which would instantiate something like an `IRepository` and then cast it where it's needed? — CatDadCode, Aug 23 '12 at 19:45
Yes it would need to have as many methods as you have repositories. It may grow to ugly quantity - that is a time to think about separate factories. Unless you have generic repository implementing the same generic interface (which I consider bad) you will not make it better. With same generic interface you can try to hide the complexity behind a single method but you still need a logic to instantiate a correct repository. — Ladislav Mrnka, Aug 23 '12 at 19:55
Let's take this one more step down the rabbit hole. Let's say I create two factories now that contain related repositories. In some part of my application I need to use both of those factories to make changes. Which factory do I call save changes on? It seems I've recreated the same problem as the repository-level save methods. — CatDadCode, Aug 23 '12 at 19:58
I guess we could then create a factory bucket containing the save and a reference to each factory. This could go on indefinitely, lol! — CatDadCode, Aug 23 '12 at 20:01
Avoid such situation. Use always single unit of work. If your unit of works share the same context, you could call any of `SaveChanges` but that would break a meaning of unit of work. Each unit of work should be unique owner of the context. — Ladislav Mrnka, Aug 23 '12 at 20:01

score 2 · Accepted Answer · answered Aug 23 '12 at 18:49

I think you are absolutely right to use the Unit of Work pattern in this case. Not only does this prevent you from needing a SaveChanges method on every repository, it provides you a nice way to handle transactions from within code rather than in your database itself. I included a Rollback method with my UOW so that if there was an exception I could undo any of the changes the operation had already made on my DataContext.

One thing you could do to prevent weird dependency issues would be to group related repositories on their own Unit of Work, rather than having one big DataBucket that holds every Repository you have (if that was your intent). Each UOW would only need to be accessible at the same level as the repositories it contained, and other repositories should probably not depend on other UOWs themselves (your repositories shouldn't need to use other repositories).

If wanted to be an even bigger purist of the pattern, you could also structure your UOWs to represent just that, a single Unit of Work. You define them to represent a specific operation in your domain, and provide it with the repositories required to complete that operation. Individual repositories could exist on more than one UOW, if it made sense to be used by more than one operation in your domain.

For example, a PlaceCustomerOrderUnitOfWork may need a CustomerRepository, OrderRepository, BillingRepository, and a ShippingRepository

An CreateCustomerUnitOfWork may need just a CustomerRepository. Either way, you can easily pass that dependency around to its consumers, more fine grained interfaces for your UOW can help target your testing and reduce the effort to create a mock.

Interesting insights. If I do go with my original idea of one data bucket containing everything, do you see a problem of having all the repositories attached to it instantiated? Another solution would be to new up the repo the first time the property on the UOW is accessed. [Something like this](https://gist.github.com/3440615). That way only the repos that are needed during that transaction would be instantiated. What do you think? — CatDadCode, Aug 23 '12 at 19:27
@AlexFord that's exactly how I implemented mine, by "lazy-loading" the repositories `if(repo == null) then //instantiate`. Other than creating a monolithic interface, keeping everything in one DataBucket should be fine providing you intend on all your Repository implementations going in the same assembly as your Unit of Work implementation. Not that you COULDN'T have them in different ones, however this opens you up for potential to have weird cyclical dependencies. — mclark1129, Aug 23 '12 at 19:49
Yeah everything we are discussing here is done in our "Domain" assembly so we shouldn't have to worry there. Thanks for your input. You've been a great help. — CatDadCode, Aug 23 '12 at 19:52

usr · Answer 3 · 2012-08-23T17:51:55.787

1

The notion of every repository having a SaveChanges is flawed because calling it saves everything. It is not possible to modify part of a DataContext, you always save everything. So a central DataContext holder class is a good idea.

Alternatively, you could have a repository with generic methods that can operate on any entity type (GetTable<T>, Query<T>, ...). That would get rid of all those classes and merge them into one (basically, only DataBucket remains).

It might even be the case that you don't need repositories at all: You can inject the DataContext itself! The DataContext by itself is a repository and a full fledged data access layer. It doesn't lend itself to mocking though.

If you can do this depends on what you need the "repository" do provide.

The only issue with having that DataBucket class would be that this class needs to know about all entities and all repositories. So it sits very high in the software stack (at the top). At the same time it is being used by basically everything so it sits at the bottom, too. Wait! That is a dependency cycle over the whole codebase.

This means that everything using it and everything being used by it must sit in the same assembly.

edited Aug 23 '12 at 17:51

answered Aug 23 '12 at 17:37

usr

168,620
35
240
369

Right, the repository level savechanges makes no sense. That's why I'm tackling this :). I fiddled with a generic repository and ended up finding that too many repositories needed special methods and functionality to fit into a generic template. I also do not like building LINQ expression trees above the repository level because it kind of defeats the purpose of abstraction; It basically makes the upper layers dependent upon a data layer that will always understand expression trees against IQueryable. – CatDadCode Aug 23 '12 at 17:40
On the other hand, not allowing arbitrary expressions outside of the repo forces you to add specialized query methods (and return type classes) for all non-trivial (entity-get) use cases.; Speaking from experience with a big codebase with 300+ tables here: The repository patern is not worth it. We never change ORM's. We never change our RDBMS. We never decide to switch to NoSQL. Just never happens. A repository abstraction would add nothing useful, but add on efforts and complexity. – usr Aug 23 '12 at 17:45
In your situation I would likely agree. Unfortunately I am tasked with maintaining separation of concerns in such away that we _could_ switch to NoSQL if we wanted to. – CatDadCode Aug 23 '12 at 17:47
Do you see any issues with including all the specialized repositories as properties on the data bucket? Will instantiating ALL of the repos with the data bucket be a performance concern? In our current configuration only the repos that are needed get instantiated. – CatDadCode Aug 23 '12 at 17:49
You can't switch anyway because you will be relying on many features of your RDBMS implicity (for example, cross entity transactions, range scans on indexes, locking, ...). The list is endless. You *can't* just switch because performance concerns are cross cutting, they cannot be abstracted away. Not sure if you are in a position to convince the team on this issue.; I added commentary on an architectural issue to my answer. – usr Aug 23 '12 at 17:52
I'm not quite following. Right now it would seem to me that, because we don't use our entity objects any different than simple objects, we could just code up all our entities as dumb objects instead of EF objects and then rewrite our repositories to use an entirely different database tech to fill those objects. No? – CatDadCode Aug 23 '12 at 17:55
@BZink Yes, but how do you unit test it? It has to be abstracted if your goal is to unit test it. – CatDadCode Aug 23 '12 at 17:57
You can use any data technology to fill your objects. This will have different perf characterstics though. It is not a correctness issue, it is a performance issue. The application will be built on the assumption that certain joins are fast. With NoSQL joins suddenly become slow. Now the app breaks, because it is slow. It works, but too slowly.; My point is you depend on your data store in very subtle ways.; Anyway if you want to switch, just branch your trunk, rip out the old stuff and add the new, then merge back. This is a rare event after all. – usr Aug 23 '12 at 18:03
1

I'm not really intending to have a debate about which database tech to use. The point is that we could switch to something entirely different with as little refactoring of business logic as possible. I was only using NoSQL as an example. It's not so much of a rare event for us. We've already done it twice and only the second time did we do the repository pattern so that the next time it will be easier. It is a very realistic possibility for our shop that we could perpetually be swapping database tech as time goes on. – CatDadCode Aug 23 '12 at 18:06
@AlexFord I don't disagree. I've struggled with EF for this reason too. – BZink Aug 23 '12 at 18:07
@AlexFord why would you need to build Expressions above the repository level with a generic repository. Just pass them as a lamba expression! `customerRepository.GetAll(x => x.CustomerId = 1234);` – mclark1129 Aug 23 '12 at 18:55
1

@BZink `DataContext` is actually more of a Unit of Work than a repository. The tables within the `DataContext` would be the repositories. – mclark1129 Aug 23 '12 at 18:58
@MikeC I don't think you understand. That lambda expression is building an expression tree against the `IQueryable` type. The underlying data store has to know how to turn that lambda into SQL which limits the type of data stores that the application can use. Understand? `customerRepository.GetById(1234)` makes no assumptions about how the data store fetches my data. Yes it's more verbose and requires more repository methods, but it means the data store could be anything from good old fashioned ADO.NET, Entity Framework, or even a NoSQL store. – CatDadCode Aug 23 '12 at 19:21
@MikeC and if you're not applying that lambda to an `IQueryable` then that's even worse because it means you're pulling **every single row** from the customers table into memory and then applying LINQ to Objects to filter down to the single one you want. Ick! – CatDadCode Aug 23 '12 at 19:22
@AlexFord Yes, I understand the difference between `IEnumerable` and `IQueryable` in this case. I simply meant that a generic repository does not require you to build LINQ expression trees above the repository level. As you pointed out, the compiler will do that for you when passing a lambda to an `IQueryable`. – mclark1129 Aug 23 '12 at 19:32
@MikeC Let me clarify. Let's say I need a customer by ID in one scenario and by Email in another. With your generic `GetAll(lambda)` method you are now making the assumption that the repository method will know what to do with that lambda. Let's pretend I, for some strange reason, need to go back to old fashioned ADO.NET now. I would have to pull in a `DataTable` containing every row from the table and loop through every record, passing each one into the `Func` to see if that record matches my lambda. – CatDadCode Aug 23 '12 at 19:36
@AlexFord To your other point. You are right that most of the generic repositories depend on the IQueryable interface, but the only assumption that makes about your data source is that it has a `QueryProvider` implemented for it. ADO.NET and EF already have them (LINQ to SQL, and LINQ to Entities). There is nothing stopping someone from creating LINQ to NoSQL. IQueryable would not prevent you from using whatever technology you wanted. And, if you think about it, when you implement `customerRepository.GetById` you are technically implementing your own rudimentary `QueryProvider` – mclark1129 Aug 23 '12 at 19:38
My repository would instead have two methods instead of one. `GetById(1234)` and `GetByEmail(email)`. In the method itself I then know exactly what to do to get only the record I want from the database. Even in something as antiquated as ADO.NET. – CatDadCode Aug 23 '12 at 19:38
that takes `.GetById` and translates it to SQL. Fundamentally, there's no difference between the two approaches. The old school way is just easier (although many times less elegant). – mclark1129 Aug 23 '12 at 19:39
@MikeC Yes that is the other thing I was going to say. The other alternative would be that any new data store would require a query provider. I don't see that as practical in our situation, writing a new query provider each time we change technologies. – CatDadCode Aug 23 '12 at 19:39
I also feel like it would be much harder to unit test a repository with such a generic fetch method. There would be no way to test every scenario. – CatDadCode Aug 23 '12 at 19:41
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/15733/discussion-between-alex-ford-and-mike-c) – CatDadCode Aug 23 '12 at 19:49

score 0 · Answer 4 · answered Aug 23 '12 at 17:41

What I have done in the past was to create child injection containers (I was using Unity) and register a data context with a ContainerControlledLifetime. So that when the repositories are instantiated, they always have the same data context injected into them. I then hang on to that data context and when my "Unit of Work" is complete, I call DataContext.SaveChanges() flushing all the changes out to the database.

This has some other advantages such as (with EF) some local caching, such that if more than one repository needs to get the same entity, only the first repository actually causes a database round trip.

It's also a nice way to "batch up" the changes and make sure they execute as a single atomic transaction.

Trying to simplify our repository pattern

4 Answers4