2

I have a very deep relational tree in my model design, that is, the root entity contains a collection of entities that contains more collections of other entities that contains more collections and on an on ... I develop a business layer that other developers have to use to perform operations, including get/save data.

Then, I am thinking about what is the best strategy to cope with this situation. I cannot allow that when retrieving a entity, EF resolves all the dependency tree, since it will end in a lot of useless JOIN (useless because maybe I do not need that data in the next level).

  • If I disable lazy loading and enforce eager loading for what is needed, it works as expected, but if other developer calls child.Parent.Id instead of child.ParentId trying to do something new (like a new requirement or feature not considered at the beggining), it will get a NullReferenceException if that dependency was not included, which is bad... but it will be a "fast error", and it could be fixed straight away.

  • If I enable lazy loading, accessing child.Parent.Id instead of child.ParentId will end in a standalone query to the DB each time it is accessed. It won't fail, but it is worse because there is no error, only a decrement in the performance, and all the code should be reviewed.

I am not happy with any of these two solutions.

  • I am not happy having entities that contains null or empty collections, when in reality, it is not true.

  • I am not happy with letting EF perform arbitrary queries to the DB at any moment. I would like to get all the information in one shoot if possible.

So, I come up with several possible solutions that involve disabling lazy loading and enforcing eager loading, but not sure which is better:

  • I can create a EntityBase class, that contains the data in the table without the collections, so they cannot be accessed. And concrete implementations that contains the relationships, the problem is that you do not have much flexibility since C# does not allow multi-inheritance.

  • I can create interfaces that "mask" the objects hidding the properties that are not available at that method call. For example, if I have a User.Roles property, in order to show a grid will all users, I do not need to resolve the .Roles property, so I could create an interface 'IUserData' that does not contain such property.

But I do not if this additional work is worth, maybe a fast NullReferenceException indicating "This property has not been loaded" would be enough.

Would it be possible to throw a specific exception type if the property is virtual and it has not been overridden/set ?

What method do you use?

Thanks.

vtortola
  • 34,709
  • 29
  • 161
  • 263

1 Answers1

8

In my opinion you are trying to protect the developers from the need to understand what they are doing when they access data and what performance implications it can have - which might result in an unnecessary convoluted API with a lot of helper classes, base classes, interfaces, etc.

If a developer uses user.MiddleName.Trim() and MiddleName is null he gets a NullReferenceException and did something wrong, either didn't check for null or didn't make sure that the MiddleName is set to a value. The same when he accesses user.Roles and gets a NullReferenceException: He didn't check for null or didn't call the appropriate method of your API that loads the Roles of the user.

I would say: Explain how navigation properties work and that they have to be requested explicitly and let the application crash if a developer doesn't follow the rules. He needs to understand the mistake and fix it.

As a help you could make loading related data explicit somehow in the API, for example with methods like:

public User GetUser(int userId);
public User GetUserWithRoles(int userId);

Or:

public User GetUser(int userId, params Expression<Func<User,object>>[] includes);

which could be called with:

var userWithoutRoles = layer.GetUser(1);
var userWithRoles = layer.GetUser(2, u => u.Roles);

You could also leverage explicit loading instead of lazy loading to force the developers to call a method when they want to load a navigation property and not just access the property.

Two additional remarks:

...lazy loading ... will end in a standalone query to the DB each time it is accessed.

"...and not yet loaded" to complete this. If the navigation property has already been loaded within the same context, accessing the property again won't trigger a query to the database.

I would like to get all the information in one shoot if possible.

Multiple queries do not necessarily result in worse performance than one query with a lot of Includes. In fact complex eager loading can lead to data multiplication on the wire and make entity materialization very time consuming and slower than multiple lazy or explicit loading queries. (Here is an example where a query's performance has been improved by a factor of 50 by changing it from a single query with Includes to more than 1000 queries without Include.) Quintessence is: You cannot reliably predict what's the best loading strategy in a specific situation without measuring the performance (if the performance matters in that situation).

Community
  • 1
  • 1
Slauma
  • 175,098
  • 59
  • 401
  • 420
  • This is a fantastic answer. I like that method with the list of expressions to load. You made a good point with the "complex eager loding" issue, I will read those links carefully. Thanks a lot. – vtortola Sep 02 '12 at 21:52