0

Essentially, I'm trying to exclude entities with a null child object from a query result. I'd like to do this using in the most efficient way possible, loading the smallest amount of data necessary.

I am working with a database that doesn't enforce referential integrity; some Pupils have been deleted, so, unfortunately, some of the Reports that do specify PupilID no longer have a matching pupil. I want to exclude such reports.

My original problem method is further below, but here is my core problem.

var test1 =
    DbContext.Report
        .Where(x => x.Pupil != null).ToList();

var test2 =
    DbContext.Report
        .Include(x => x.Pupil)
        .Where(x => x.Pupil != null).ToList();

if (test1.Count(x => x.Pupil == null) > 0)
{
    var bp = "hit"; // I know this doesn't work
}

if (test2.Count(x => x.Pupil == null) > 0)
{
    var bp = "hit"; // But why doesn't this? And how can I make it work...
}

if (test1.Where(x => x.Pupil != null).Count(x => x.Pupil == null) > 0)
{
    var bp = "not hit"; // Without having to do this
}

I assume it's some issue to do with EF lazy loading, but I've tried explicitly loading and including them to no avail; the only thing that works is calling.ToList() on the entire query which will evaluate the proxies and allow me to exclude them correctly, however, to do this it'll load data I do not need just to throw it away. Is there any way around this?

Original problem method:

public IEnumerable<Report> GetCurrentReportsForStaffByUsername(string username)
{
    var staffId =
        GetStaffIdFromUsername(username);

    var query = (
        from reports in this.DbContext.Reports.Include(x => x.Pupil).Where(x => x.Pupil != null)
       where reports.StaffId == staffId
          && reports.Pupil != null
      select reports)
            .Include(r => r.Pupil)
            .Where(r => r.Pupil != null);

    if (query.Any(q => q.Pupil == null))
    {
        var error = "how?!?!?!?!?!?!?!?"; // <-- Hits this
    }

    if (query.ToList().Any(q => q.Pupil == null))
    {
        var error = "really?";
    }

    return query;
}

Thank-you for taking the time :)

Matthew Hudson
  • 1,306
  • 15
  • 36
  • Possible duplicate of [Linq join iquery, how to use defaultifempty](https://stackoverflow.com/questions/19293844/linq-join-iquery-how-to-use-defaultifempty) – BugFinder Jul 02 '18 at 09:05
  • @BugFinder if you can explain how, that would be great. I cannot see how to use that in this case; I don't want to replace the null object, I want to exclude it's parent entirely from the query result. Cheers, – Matthew Hudson Jul 02 '18 at 09:09
  • the defaultifempty allows your linq to properly work and you can exclude missing values – BugFinder Jul 02 '18 at 09:12
  • @BugFinder I simply don't see it, please can you post sample implementation. Even if just a copied version of my code with it added, I don't mind padding your answer out as soon as I understand it, but this comment really isn't helpful and I don't appreciate the close request with no explanation. – Matthew Hudson Jul 02 '18 at 09:15
  • @BugFinder Fundamentally; I don't want to replace null pupil with a default, I want to exclude the entire report to begin with. If this is what you mean for me to do then please explain, otherwise please retract your close request and duplicate question report. – Matthew Hudson Jul 02 '18 at 09:21
  • 1
    @MatthewHudson Looks like you are using include as a join, then later checking if any reports have Pupils that are null, which it has. maybe try with a `join p in Pupils on reports.PupilId equals p.PupilId` you can keep the first `Reports.Include(x=>x.Pupil)` the others shouldnt be needed. – Jim Wolff Jul 02 '18 at 09:39
  • @JimWolff Thanks for your comment Jim, realisation hit me like a steam train and am appalled at myself for not thinking/trying it sooner. I've given it a go but it wouldn't let me check for null on the pupils object as a whole, nor checking the PupilId is greater than zero (this would return null pupils still). It got me thinking though and I've found a solution, will post in a moment. Thanks again for your help :) – Matthew Hudson Jul 02 '18 at 10:20

2 Answers2

0

Not the nicest looking code, but it does what it's supposed to;

var test2 =
    DbContext.Report
        .Where(r => DbContext.Pupils.Any(p => p.PupilId == r.PupilId)).ToList();

Or in linq to sql;

public IEnumerable<Report> GetCurrentReportsForStaffByUsername(string username)
{
    var staffId =
        GetStaffIdFromUsername(username);

    var query = (
        from reports in this.DbContext.Reports
       where reports.StaffId == staffId
          && this.DbContext.Pupils.Any(p => p.PupilID == reports.PupilID) // <- this
      select reports)

    return query;
}
Matthew Hudson
  • 1,306
  • 15
  • 36
0

Normally you would do something like this: (building on your example)

public IEnumerable<Report> GetCurrentReportsForStaffByUsername(string username)
{
    var staffId =
        GetStaffIdFromUsername(username);

    var query = (
        from reports in this.DbContext.Reports
        join pupil in this.DbContext.Pupils on reports.PupilID equals pupil.PupilID
        where reports.StaffId == staffId
        select reports)

    return query;
}

This will become something like this in sql:

SELECT * FROM Report r
INNER JOIN Pupil p ON r.PupilID = p.PupilID

Which seems like what you want. I'd also suggest a tool like Linqpad for trying out these things, since you can write c# code and it has a tab to show what that turns into in SQL (although it's not always 1-to-1, but it gives you an idea)

That way you can try different approaches and approximately see what the results are.

Jim Wolff
  • 5,052
  • 5
  • 34
  • 44