62

I see already a un-answered question here on.

My question is -

Is EF really production ready for large application?

The question originated from these underlying questions -

  1. EF pulls all the records into memory then performs the query operation. How EF would behave when table has around ~1000 records?
  2. For simple edit I have to pull the record edit it and then push to db using SaveChanges()
Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
Abhijeet
  • 13,562
  • 26
  • 94
  • 175
  • 1) EF doesn't pull all the records into memory to do a query. 2) Okay... what does that have to do with large-scale applications? – rliu Oct 11 '13 at 06:21
  • Don't mix: Large data to update vs. large application. EF is very handy for every-day data operations, i.e. displaying, editing, adding single records. It is not optimized to do bulk insert/update operations but there are discussions and attempts and solutions to that. See [this SO Q&A](http://stackoverflow.com/questions/12751258/batch-update-delete-ef5) for more information. – alzaimar Oct 11 '13 at 06:24
  • 1
    @roliu We wanted to make sure not to pull whole tuple when there is a need to update only one value. – Abhijeet Oct 11 '13 at 06:49
  • You shouldn't be changing the post title and tags. EF Core is totally different system, so what you did invalidates all current answers. You should revert them and ask a new question instead. – Ivan Stoev Jan 07 '18 at 02:50

5 Answers5

160

I faced a similar situation where we had a large database with many tables 7- 10 million records each. we used Entity framework to display the data. To get nice performance here's what I learned; My 10 Golden rules for Entity Framework:

  1. Understand that call to database made only when the actual records are required. all the operations are just used to make the query (SQL) so try to fetch only a piece of data rather then requesting a large number of records. Trim the fetch size as much as possible

  2. Yes, (In some cases stored procedures are a better choice, they are not that evil as some make you believe), you should use stored procedures where necessary. Import them into your model and have function imports for them. You can also call them directly ExecuteStoreCommand(), ExecuteStoreQuery<>(). Same goes for functions and views but EF has a really odd way of calling functions "SELECT dbo.blah(@id)".

  3. EF performs slower when it has to populate an Entity with deep hierarchy. be extremely careful with entities with deep hierarchy

  4. Sometimes when you are requesting records and you are not required to modify them you should tell EF not to watch the property changes (AutoDetectChanges). that way record retrieval is much faster

  5. Indexing of database is good but in case of EF it becomes very important. The columns you use for retrieval and sorting should be properly indexed.

  6. When you model is large, VS2010/VS2012 Model designer gets real crazy. so break your model into medium sized models. There is a limitation that the Entities from different models cannot be shared even though they may be pointing to the same table in the database.

  7. When you have to make changes in the same entity at different places, use the same entity, make changes and save it only once. The point is to AVOID retrieving the same record, make changes & save it multiple times. (Real performance gain tip).

  8. When you need the info in only one or two columns try not to fetch the full entity. you can either execute your sql directly or have a mini entity something. You may need to cache some frequently used data in your application also.

  9. Transactions are slow. be careful with them.

  10. SQL Profiler or any query profiler is your friend. Run it when developing your application to see what does EF sends to database. When you perform a join using LINQ or Lambda expression in ur application, EF usually generates a Select-Where-In-Select style query which may not always perform well. If u find any such case, roll up ur sleeves, perform the join on DB and have EF retrieve results. (I forgot this one, the most important one!)

if you keep these things in mind EF should give almost similar performance as plain ADO.NET if not the same.

Simple Fellow
  • 4,315
  • 2
  • 31
  • 44
  • 7
    For the 8th bullet: Get AutoMapper and use `Project().To<>()`. – mostruash Mar 06 '14 at 08:54
  • 1
    2. You MUST use stored procedures? I want to get away from SPROC's because that is not EF really is it. I could just use dapper or something and ONLY use SPROCs for everything and not even bother with EF or anything. – Piotr Kula Nov 28 '14 at 09:22
  • 4
    @ppumkin: Sometimes, there are situations where you cannot effectively issue statements from EF. In that case stored procedures come to help. but it is really your choice. if you can achieve your target without using SP, don't bother using them at all. – Simple Fellow Jan 19 '15 at 13:30
  • 4
    Yea. But 2, a year later, I can conclude that you MUST NOT use SP anymore. The problem is though, that DB designers are threatened by this because that is the last thing they could control. But they don't understand that DB design has changed to maintenance instead. Dev define models, EF build the DB and queries, and DB folk only need to make sure the DB runs quick and set indexing up properly. I don't use SP, TSQL or SSMS at all anymore and my projects highly scalable and extremely fast. Number 2 is incorrect. – Piotr Kula Jan 19 '15 at 13:36
  • 21
    I work on a banking app, SPs are a big help. I cannot say "Hey! we are using ORM now, so let's throw away all your loyal stored procedures that have been working for years & let us write all that in EF from scratch" - "Why?" - "Because I love EF!". As for "DEV DEFINE MODELS, EF BUILD THE DB AND QUERIES, AND DB FOLK ONLY NEED TO MAKE SURE THE DB RUNS QUICK" - this may be your case but for enterprise level apps, tables, SPs, functions are maintained by seperate developers which we call DB developers who use their TSQL and PLSQL expertise. Hope you must have heard of them! – Simple Fellow Jan 19 '15 at 14:20
  • Re: #8 - I was selecting a lot more than 1 or 2 columns, so I thought this wouldn't help, but turns out removing one varchar(2000) column from the query (I selected to a mini entity) cut the execution time of the query from 3 seconds to less than 1 second. – Captain Delano Dec 07 '15 at 04:57
  • 1
    Good list. Regarding stored procedures, you can make an excellent case for them any time you have larger datasets that require crossing contexts. EF just does not handle that well at present. A sproc can easily cross contexts (e.g., two databases) and will out perform any other solution I've encountered. – Ron Mar 08 '17 at 14:28
  • Projecting to a Struct instead of an anonymous object or tuple is 30% faster on EF Core 3.1 - 20 million rows, three properties – BobbyTables Jun 29 '20 at 11:37
  • for the 8th point create a DTO(Data Transfer Object) and use AutoMapper to map the DTO with actual Domain object. – Jaydeep Shil Sep 08 '20 at 13:06
21

1. EF pulls all the records into memory then performs the query operation. How EF would behave when table has around ~1000 records?

That's not true! EF fetches only necessary records and queries are transformed into proper SQL statements. EF can cache objects locally within DataContext (and track all changes made to entities), but as long as you follow the rule to keep context open only when needed, it won't be a problem.

2. For simple edit I have to pull the record edit it and then push to db using SaveChanges()

It's true, but I would not bother in doing that unless you really see the performance problems. Because 1. is not true, you'll only get one record from DB fetched before it's saved. You can bypass that, by creating the SQL query as a string and sending it as a plain string.

MarcinJuraszek
  • 124,003
  • 15
  • 196
  • 263
  • 1
    Thanks, Can I trust EF that it will not do in memory operation when joining multiple tables? – Abhijeet Oct 11 '13 at 06:40
  • It is very possible to do very complex joins on multiple tables with EF in SQL, however it can take some experience to avoid some accidental in memory joins. Entity Linq has a relatively shallow learning curve, but because of this, also has many gotchas. – Aron Oct 11 '13 at 07:18
  • 4
    This answer is misleading. YES, EF DOES PERFORM JOINS IN MEMORY IF it has a set of values that are provided as a part of query or an in memory list, basically for anything that is not from the database, EF will pull everything from the database, perform the operations in memory and returns the results. If you want to experiment, perform a table join with a list or array and see how it behaves. As of now, with EF 6.0, this is still true. – Simple Fellow Sep 17 '14 at 14:08
7
  1. EF translates your LINQ query into an SQL query, so it doesn't pull all records into memory. The generated SQL might not always be the most efficient, but a thousand records won't be a problem at all.
  2. Yes, that's one way of doing it (assuming you only want to edit one record). If you are changing several records, you can get them all using one query and SaveChanges() will persist all of those changes.
Botz3000
  • 39,020
  • 8
  • 103
  • 127
6

EF is not a bad ORM framework. It is a different one with its own characteristics. Compare Microsoft Entity Framework 6, against say NetTiers which is powered by Microsoft Enterprise Library 6.

These are two entirely different beasts. The accepted answer is really good because it goes through the nuances of EF6. Whats key to understand is that each ORM has its own strengths and weaknesses. Compare the project requirements and its data access patterns against the ORM's behavior patterns.

For Example: NetTiers will always give you higher raw performance than EF6. However that is primarily because it is not a point and click ORM and as part and parcel of generating the ORM you will be optimizing your data model, adding custom stored procedures where relevant, etc... if you engineered your data model with the same effort for EF6 you could probably get close to the same performance.

Also consider can you modify the ORM? for example with NetTiers you can add extensions to the codesmith templates to include your own design patterns over and above what is generated by the base ORM library.

Also consider EF6 makes significant use of reflection whereas NetTiers or any library powered by Microsoft Enterprise Library will make heavy use of Generics instead. These are two entirely different approaches. Why so? Because EF6 is based on dynamic reflection whereas NetTiers is based on static reflection. Which is faster and which is better entirely depends on the usage patterns that will be required of the ORM.

Sometimes a hybrid approach works better: Consider for example EF6 for Web API OData endpoints, A few large tables wrapped with NetTiers & Microsoft Enterprise Library with custom stored procedures, and a few large masterdata tables wrapped with a custom built write through object cache where on initial load the record set is streamed into the memory cache using an ADO data reader.

These are all different and they all have their best fit scenarios: EF6, NetTiers, NHibernate, Wilson OR Mapper, XPO from Dev Express, etc...

tcwicks
  • 495
  • 3
  • 11
4

There is no simple answer for your question. The main thing is about what you want to do with your data? And do you need so much data at one time?

EF translated your Queries to SQL so at this time there is no Object in Memory. When you get the data, then the selected records are in memory. If you are selecting a large amount of large objects then it can be a performance killer if you need to manipulate them all.

If you don't need to manipulate them all you can disable change tracking and enable it later for single objects you need to manipulate.

So you see it depends on your type of application. If you need to manipulate a mass of data efficient, then don't use a OR-Mapper!

Otherwise EF is fine, but consider how many objects you really need at one time and what you want to do with them.

Dannydust
  • 763
  • 5
  • 8