50

I am using the following LINQ query for my profile page:

var userData = from u in db.Users
                        .Include("UserSkills.Skill")
                        .Include("UserIdeas.IdeaThings")
                        .Include("UserInterests.Interest")
                        .Include("UserMessengers.Messenger")
                        .Include("UserFriends.User.UserSkills.Skill")
                        .Include("UserFriends1.User1.UserSkills.Skill")
                        .Include("UserFriends.User.UserIdeas")
                        .Include("UserFriends1.User1.UserIdeas")
                               where u.UserId == userId
                               select u;

It has a long object graph and uses many Includes. It is running perfect right now, but when the site has many users, will it impact performance much?

Should I do it in some other way?

Captain Delano
  • 427
  • 1
  • 4
  • 12
teenup
  • 7,459
  • 13
  • 63
  • 122

5 Answers5

88

A query with includes returns a single result set and the number of includes affect how big data set is transfered from the database server to the web server. Example:

Suppose we have an entity Customer (Id, Name, Address) and an entity Order (Id, CustomerId, Date). Now we want to query a customer with her orders:

var customer = context.Customers
                      .Include("Orders")
                      .SingleOrDefault(c => c.Id == 1);

The resulting data set will have the following structure:

 Id | Name | Address | OrderId | CustomerId | Date 
---------------------------------------------------
  1 |  A   |   XYZ   |    1    |     1      | 1.1.
  1 |  A   |   XYZ   |    2    |     1      | 2.1.

It means that Cutomers data are repeated for each Order. Now lets extend the example with another entities - 'OrderLine (Id, OrderId, ProductId, Quantity)andProduct (Id, Name)`. Now we want to query a customer with her orders, order lines and products:

var customer = context.Customers
                      .Include("Orders.OrderLines.Product")
                      .SingleOrDefault(c => c.Id == 1);

The resulting data set will have the following structure:

 Id | Name | Address | OrderId | CustomerId | Date | OrderLineId | LOrderId | LProductId | Quantity | ProductId | ProductName
------------------------------------------------------------------------------------------------------------------------------
  1 |  A   |   XYZ   |    1    |     1      | 1.1. |     1       |    1     |     1      |    5     |    1      |     AA
  1 |  A   |   XYZ   |    1    |     1      | 1.1. |     2       |    1     |     2      |    2     |    2      |     BB
  1 |  A   |   XYZ   |    2    |     1      | 2.1. |     3       |    2     |     1      |    4     |    1      |     AA
  1 |  A   |   XYZ   |    2    |     1      | 2.1. |     4       |    2     |     3      |    6     |    3      |     CC

As you can see data become quite a lot duplicated. Generaly each include to a reference navigation propery (Product in the example) will add new columns and each include to a collection navigation property (Orders and OrderLines in the example) will add new columns and duplicate already created rows for each row in the included collection.

It means that your example can easily have hundreds of columns and thousands of rows which is a lot of data to transfer. The correct approach is creating performance tests and if the result will not satisfy your expectations, you can modify your query and load navigation properties separately by their own queries or by LoadProperty method.

Example of separate queries:

var customer = context.Customers
                      .Include("Orders")
                      .SingleOrDefault(c => c.Id == 1);
var orderLines = context.OrderLines
                        .Include("Product")
                        .Where(l => l.Order.Customer.Id == 1)
                        .ToList();

Example of LoadProperty:

var customer = context.Customers
                      .SingleOrDefault(c => c.Id == 1);
context.LoadProperty(customer, c => c.Orders);

Also you should always load only data you really need.

Edit: I just created proposal on Data UserVoice to support additional eager loading strategy where eager loaded data would be passed in additional result set (created by separate query within the same database roundtrip). If you find this improvement interesting don't forget to vote for the proposal.

user1987392
  • 3,921
  • 4
  • 34
  • 59
Ladislav Mrnka
  • 360,892
  • 59
  • 660
  • 670
  • 1
    +5 (if I could) I wasn't aware of that and used to work with eager loading rather naively. That's a really good example how much knowledge about the "R" in "ORM" is still necessary to use an ORM effectively - unfortunately... – Slauma Apr 02 '11 at 18:20
15

(You can improve performance of many includes by creating 2 or more small data request from data base like below.

According to my experience,Only can give maximum 2 includes per query like below.More than that will give really bad performance.

var userData = from u in db.Users
                        .Include("UserSkills.Skill")
                        .Include("UserIdeas.IdeaThings")
                        .FirstOrDefault();

 userData = from u in db.Users
                    .Include("UserFriends.User.UserSkills.Skill")
                    .Include("UserFriends1.User1.UserSkills.Skill")
                    .FirstOrDefault();

Above will bring small data set from database by using more travels to the database.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Sampath
  • 63,341
  • 64
  • 307
  • 441
  • @MikeCole Thanks at least you have benefited of this post. – Sampath Apr 25 '13 at 09:29
  • 1
    Weirdly, you are right, it does work. In my case the combined single query failed to correct sql, but two separate ones work. – Ian Mar 01 '17 at 14:44
8

Yes it will. Avoid using Include if it expands multiple detail rows on a master table row.

I believe EF converts the query into one large join instead of several queries. Therefore, you'll end up duplicating your master table data over every row of the details table.

For example: Master -> Details. Say, master has 100 rows, Details has 5000 rows (50 for each master).

If you lazy-load the details, you return 100 rows (size: master) + 5000 rows (size: details).

If you use .Include("Details"), you return 5000 rows (size: master + details). Essentially, the master portion is duplicated over 50 times.

It multiplies upwards if you include multiple tables.

Check the SQL generated by EF.

Stephen Chung
  • 14,497
  • 1
  • 35
  • 48
  • 1
    +1 I found this myself. It's always better to create a number of smaller queries and execute them separately than it is to run them one by one. But what's good is that by the magic of EF, it automatically builds up the object graph for you. So if you load your users in one query, then load your skills, they'll automatically appear in each other's navigation properties. (I'm assuming this is EF in general as I use Code First). – djdd87 Apr 02 '11 at 08:48
  • @Generic Type Tea, I believe it is general for EF. In fact, I think they build the navigation properties upon first access... – Stephen Chung Apr 02 '11 at 09:09
3

The result of include may change: it depend by the entity that call the include method.

Like the example proposed from Ladislav Mrnka, suppose that we have an entity

Customer (Id, Name, Address)

that map to this table:

Id  |  Name   | Address
-----------------------
C1  |  Paul   |   XYZ   

and an entity Order (Id, CustomerId, Total)

that map to this table:

Id |  CustomerId  | Total
-----------------------
O1 |      C1      |  10.00
O2 |      C1      |  13.00

The relation is one Customer to many Orders


Esample 1: Customer => Orders

var customer = context.Customers
                      .Include("Orders")
                      .SingleOrDefault(c => c.Id == "C1");

Linq will be translated in a very complex sql query.

In this case the query will produce two record and the informations about the customer will be replicated.

 Customer.Id   |   Customer.Name |    Order.Id |  Order.Total
-----------------------------------------------------------
     C1        |       Paul      |       O1    |    10.00     
     C1        |       Paul      |       O2    |    13.00   

Esample 2: Order => Customer

var order = context.Orders
                      .Include("Customers")
                      .SingleOrDefault(c => c.Id == "O1");

Linq will be translated in a simple sql Join.

In this case the query will produce only one record with no duplication of informations:

 Order.Id |  Order.Total |  Customer.Id   |   Customer.Name
-----------------------------------------------------------
     O1   |    10.00     |      C1        |       Paul    
Hamid Pourjam
  • 20,441
  • 9
  • 58
  • 74
Marco Staffoli
  • 2,475
  • 2
  • 27
  • 29
3

I would recommend you to perform load tests and measure the performance of the site under stress. If you are performing complex queries on each request you may consider caching some results.

Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928