There are a few things I think need improving here.
Includes are not meant for filtering.
It's simply not the correct place to do so.
Include
is meant to automatically retrieve all linked entities. Since you don't want all of the entities (you only want a subset), you shouldn't be using an Include
.
Alternatively, you could still use an Include
, so long as you're happy to remove the unwanted entries in memory (i.e. after they've been loaded). But I assume you don't want that.
Instead, you could use an explicit Select
statement. As a simple example:
context.Projects
.Where(p => p.Id == projectId)
.Select(p => new ConsultantSearchResult() {
Project = p,
ConsultantsNamedBob = p.Consultants.Where(c => c.FirstName == "Bob")
}).ToList();
Notice the absence of the Include
. As I said before, Include
is used for automatically (and implicitly) loading the related data. But because you explicitly stated the data you want in the Select
, there's no need for an implicit inclusion anymore. EF will give you exactly what you ask for.
Your Select
is unintuitive
I think you're expecting something different than what you're getting. Looking at the code:
return context.Timesheets //1
.Where(...) //2
.Select(t => t.Project) //3
Look at what happens:
- You select all timesheets.
- You filter the timesheets and are left with a subset of timesheets
- You get a list of the projects of each timesheet.
If your filtering (step 2) leaves you with multiple timesheets from the same project, then .Select(t => t.Project)
will give you multiple instances of the same project. And that's not good.
There are two exceptions here:
- You know that you are going to find one timesheet in total. But then you should be using
First
, Single
, FirstOrDefault
or SingleOrDefault
. You should only use Where
if it's possible that you get more than one result.
- You expect more than one timesheet, but you know that you'll never find two timesheets from the same project (thus never creating duplicates when you call the
Select
). I would assume (by reading the entity names) that it's possible for a specific consultant to have multiple timesheets for the same project, but maybe that's not true.
- If my inference is correct, then you will experience the problem with duplicate projects after you do the
Select
.
- If my inference is not correct, then I would expect a much stronger relationship between timesheets and consultant, as every project consultant would have exactly 1 (or no) timesheet, never more than 1. But your current data structure lacks any real relationship between timesheets and consultants.
A quick solution would be to use Distinct
:
return context.Timesheets
.Where(...)
.Select(t => t.Project)
.Distinct()
But I personally think that a better solution would be to invert the lookup: start with projects, filter the projects on their timesheets (rather than filtering the timesheets):
return context.Projects
.Include(p => p.Timesheets)
.Where(p => p.Timesheets.Any(t => t.UserId == userId && ...))
.ToList();
This precludes the issue with the duplicate projects. Note that this doesn't yet solve your "filtered Include" problem.
Separate queries in the same context
This was also mentioned in the comments. This is a viable option, but I find it a dirty approach that is going to create unintuitive code down the line.
A simple example:
context.Configuration.LazyLoadingEnabled = false;
var parent = context.Set<Entity>().First(e => e.Name = "ABC");
// Load relations in separate query
context.Set<Child>()
.Where(c => c.Parent.Name == "ABC")
.OrderBy(c => c.Name) // You can at least try it but as mentioned above it may not work in all scenarios
.Load();
// Now parent.Children collection should be filled
The example uses OrderBy
instead of Where
, but both will work the same way.
Even though you queried the children and the parent separately, their navigational properties will continuously be updated because you're running your queries in the same context.
It's a viable option for you, but I'm a bit apprehensive of this code, as it is in no way readable that the second query alters the result of the first query.
To me, this feels equally dirty to e.g. having business logic in the get
or set
of a property. It works, but it leads to unexpected behavior and will make debugging really difficult.
Note that it may be clear to you what is happening behind the scenes, but it's easy for a different developer to gloss over it when looking at the code.
I personally don't like this, but your opinion may differ.
Your incomplete data structure is making it complicated.
Looking at your code sample, I think there's a bit of an issue with your data consistency. You are using the userId
filter in two places:
- Timesheet:
t => t.UserId == userId
- Consultant:
c => c.UserId == userId
If timesheets are connected to a consultant, then there should be a relationship between these two entities. As it currently stands, your project has a list of timesheets and a list of consultants, with no discernible relationship between timesheets and consultant.
This is why your lookup is co complicated. You're trying to mock a relationship that isn't there.
If the relationship did exist, it would be much easier to look everything up:
return context.Timesheets
.Include(t => t.Project)
.Include(t => t.Project.Account)
.Include(t => t.Consultant)
.Where(t => t.Consultant.UserId == userId && t.SubjectDate == date && t.WorkingDaySchedules.Count() > 0)
.ToList()
And then you get what you're looking for. You no longer have to do two separate userId
checks, you're no longer required to "manually synchronize" the fake relationship, the lookup process is much more streamlined and readable.
Minor comment
Maybe something you didn't know yet. You can rewrite
t.WorkingDaySchedules.Count() > 0
as
t.WorkingDaySchedules.Any() //is there at least one item in the collection?
With the added benefit that you can add filters if you need to:
t.WorkingDaySchedules.Any(wds => wds.IsActive) //is there at least one item in the collection that meets the condition?