In Entity Framework is it more efficient / preferred to create projections rather than selecting full entities?

Question

I am working on an application, trying to improve performance. Obviously I will be doing my own profiling and testing, but I would like to know if there is a "consensus" or known best practice.

In the old SQL days, one of the main things to do to improve efficiency was to not select data you aren't going to consume. I'm trying to go down that route with EF6.

In this particular case, I have a master-detail-detail relationship where I need to render some data about the parent, child, and grandchild on the screen.

My application is n-tier with an MVC front end, and a web-api REST backend. These entities are ultimately going to be serialized as JSON, sent over the rest connection back to the MVC controller, where they will be rendered to the screen. In this case I will not be updating the entities from this flow, so I don't need to worry about merging partial entities back into the repository (in those cases, I would probably send over the full entity for ease of maintenance)

So, the original straightforward EF code I wrote looks like this

Repository.GetAll()
          .AsNoTracking()
          .Include("Children")
          .Include("Children.GrandChildren")
          .ToList();

However, I am only actually consuming a subset of the properties of these entities, and some of the unused properties can be rather large (big chunks of XML, etc)

Here is a first pass at trying to project out only the fields I need (for the example here, I have cut out and renamed most of the fields I would actually select to improve readability, but in general I'm using lets say 5-20% of the full entities)

var projection = Repository.GetAll()
            .AsNoTracking()
            .Select(r => new
            {
                r.Id,
                r.RandomId,
                r.State,
                r.RequestType,
                r.CreatedDate,
                r.CreatedBy,
                Children = r.Children.Select(r2 => new
                {
                    r2.Id,
                    r2.Status,
                    GrandChildren = r2.GrandChildren.Select(r3 => new
                    {
                        r3.Id,
                        r3.Status,
                        r3.GrandChildType
                    })
                }),
            }
            ).ToList();

This is obviously using anonymous types (I believe this is required in EF there is not a way to project into a named type?) (edit : apparently you can project into a non-mapped named type, but in this case, the return type of the query is a mapped type. So I could create a DTO, but that's even more code to maintain)

so then I have to get back into my concrete types. I could certainly generate DTOs that only had the properties needed, but I don't think that changes the fundamental logic used, nor probably the performance characteristics.

I tried my standbys of Automapper and ValueInjecter, but neither one seemed to fit the bill perfectly here (deep clone of heterogeneous types with matching names) so I went dirty

var json = projection.Select(JsonConvert.SerializeObject).ToList();

var mapped = json.Select(JsonConvert.DeserializeObject<Parent>).ToList();

This is somewhat lame since its just going to be serialized again as part of the rest call. There is probably a way I can override the webAPI calls to say I am returning the already serialized data, which would let me skip the rehydration into the entity type (as all of the property names match, the rest client should be able to rehydrate the anonymous type as if it were the real type, the same way the snippet above does)

BUT all this seems like a lot of work, less maintainable code, more possible places to have bugs, etc for a use case that entity framework really does not seem to want to support. But my old school instincts can't let go of the idea that I'm selecting, serializing, and transferring a whole lot of data that ultimately I'm not going to consume.

Does this produce sane SQL under the covers? Is that worth the double serialization? (assuming I don't figure out how to override webapi to let me hand it the data)

I suppose my other choice would be to refactor all the entities so that the unused properties are in different sub entities that I can just not include, but that would be a lot of rework throughout the system (versus being able to surgically improve performance at critical points) and it also seems like a poor choice to design entities around the ORM I happen to be using vs standard normalization rules etc.

JamieD77 · Answer 1 · 2015-10-01T14:10:15.537

you can project into named types.

var projection = Repository.GetAll()
        .AsNoTracking()
        .Select(r => new ParentModel()
        {
            Id = r.Id,
            RandomId = r.RandomId,
            State = r.State,
            RequestType = r.RequestType,
            CreatedDate = r.CreatedDate,
            CreatedBy = r.CreatedBy,
            Children = r.Children.Select(r2 => new ChildModel()
            {
                Id = r2.Id,
                Status = r2.Status,
                GrandChildren = r2.GrandChildren.Select(r3 => new GrandChildModel
                {
                    Id = r3.Id,
                    Status = r3.Status,
                    GrandChildType = r3.GrandChildType
                })
            }),
        }
        ).ToList();

but yes it does make sense to not include fields you're not going to need.

lately when using the DTO approach or Models.. i'll add a static Func to my model and use that in my context projections.. In your case it would look something like

public class ParentModel
{
    public int Id { get; set; }
    public int RandomId { get; set; }
    public string State { get; set; }
    public List<ChildModel> Children { get; set; }
    public static Func<Parent, ParentModel> Project = item => new ParentModel
    {
        Id = item.Id,
        RandomId = item.RandomId,
        State = item.State,
        Children = item.Children.Select(ChildModel.Project)
    };
}
public class ChildModel
{
    public int Id { get; set; }
    public int Status { get; set; }
    public string State { get; set; }
    public List<GrandChildModel> GrandChildren { get; set; }

    public static Func<Child, ChildModel> Project = item => new ChildModel
    {
        Id = item.Id,
        Status = item.Status
        GrandChildren = item.GrandChildren.Select(GrandChildModel.Project)
    };
}
public class GrandChildModel
{
    public int Id { get; set; }
    public int Status { get; set; }
    public int GrandChildType { get; set; }
    public static Func<GrandChild, GrandChildModel> Project = item => new GrandChildModel
    {
        Id = item.Id,
        Status = item.Status,
        GrandChildType = item.GrandChildType
    };
}

then your projection code would just look like

var projection = Repository.GetAll()
      .AsNoTracking()
      .Include("Children")
      .Include("Children.GrandChildren")
      .Select(ParentModel.Project)
      .ToList();

I suppose you mean `Id = r.Id,` and so on. When not initializing an anonymous type, you must specify the property names. — CodeCaster, Sep 30 '15 at 21:31
Ill try it with the concrete types, but I swear I read something that said linq to sql only supported anonymous projections. — Jason Coyne, Sep 30 '15 at 21:36
Indeed, when I try the named types I get "Additional information: The entity or complex type 'ConcreteType' cannot be constructed in a LINQ to Entities query. — Jason Coyne, Sep 30 '15 at 21:42
https://stackoverflow.com/questions/5325797/the-entity-cannot-be-constructed-in-a-linq-to-entities-query — Jason Coyne, Sep 30 '15 at 21:46
I created some DTOs that inherit from the real entities and marked them as notmapped. Then I added Cast<>() calls to the 3 select statements. There is overhead of all the casts, but its probably less overhead than the double serialization. — Jason Coyne, Sep 30 '15 at 22:00
@CodeCaster you're right, i just copied his answer.. i'll update.. thanks! — JamieD77, Oct 01 '15 at 13:50

score 2 · Accepted Answer · answered Sep 30 '15 at 21:53

2

Use table splitting which allows you to split a table into more than one entity without modifying the underlying table. The "less accessed" properties can be lazy loaded on demand or loaded eagerly, behaving as any other navigation property. Note the key here is that the additional entities use their pk as the fk to the primary entity

answered Sep 30 '15 at 21:53

Moho

15,457
1
30
31

I've thought of this one, but it would also require wider scale changes to the app vs something I can do surgically at the pain points. If I was starting from scratch though, I'd probably strongly consider this. – Jason Coyne Sep 30 '15 at 21:56
1

table splitting is the proper design for your situation and would be far easier to maintain going forward than any other solution. – Moho Sep 30 '15 at 22:00
Table splitting will work for the large columns I think, but for the general case of a single entity with many small columns that I need random subsets of it won't really work (unless you can define the same properties in multiple split entities so get arbitrary subsets) – Jason Coyne Sep 30 '15 at 22:05
You are correct, but If the columns are small, what's the problem? – Moho Sep 30 '15 at 22:07

score 1 · Answer 3 · answered Oct 01 '15 at 05:30

Of course, projects are better. There are two approaches,

Use DTO or Split Tables

Problem is, too much to code and too much to manage, however this is better approach as you can visualize all entities and refactor them easily when needed.

Or use a separate DbContext with limited model just as Readonly context.

Dynamic DTO Proxy

Instead, in ASP.NET MVC, I created a REST proxy layer, which allows me to dynamically query without creating too many DTOs (Data Transfer Objects). In my approach, I had created a query like,

/app/entity/message/query
    ?query={UserID:2}
    &orderBy=DateReceived+DESC
    &fields={MessageID:'',Subject:''}
    &start=10
    &size=10

query expects anonymous object as filter, here are more examples

Here, I am passing required fields in query string, in this design, my API layer creates a projection by using Reflection and Expression API. This is kind of complex. But this allows me to not create many permutations and combinations.

Source code is here, https://github.com/neurospeech/atoms-mvc.net

However, this code was used to create Entity Framework like context in JavaScript, which would then load related navigation properties asynchronously as a separate query but once again with limited fields.

There is a class LinqRuntimeTypeBuilder.cs which contains source code to build type dynamically that can be used in query. https://github.com/neurospeech/atoms-mvc.net/blob/master/src/Mvc/LinqRuntimeTypeBuilder.cs

This approach is little longer, as it requires setting up "Firewall" against entities to control access.

score 0 · Answer 4 · answered Oct 01 '15 at 09:15

0

You might be interested in the Command-Query Responsibility Segregation (CQRS) approach, where you have one model for the write side and multiple separate query models tailored for your reads. The write model will typically reflect a single entire EF entity, while the read models are any projections and aggregations of data from as many entities as you want.

Yes that means there will be multiple read model DTO's with just the properties you need in them, but it's much better in terms of consistency and expressiveness than sending client code a big bag of data and letting it cherrypick what it wants IMO.

answered Oct 01 '15 at 09:15

guillaume31

13,738
1
32
51

I agree with this approach, but it doesn't really answer the EF question, because I would have to somehow select the data to load into the DTOs which would require one of the techniques in the other answers. – Jason Coyne Oct 01 '15 at 19:03
It does answer the title of your question though ;) – guillaume31 Oct 02 '15 at 07:09

In Entity Framework is it more efficient / preferred to create projections rather than selecting full entities?

4 Answers4