0

I'm currently working on a WPF application which was build using entity framework to access data (SQL Server database) (database first).

In the past, the database was on an internal server and I did not notice any problem regarding the performance of the application even though the database is very badly implemented (only tables, no views, no indexes or stored procedure). I'm the one who created it but it was my first job and I was not very good with databases so I felt like entity framework was the best approach to focus mainly on code.

However, the database is now on another server which is waaay slower. As you guessed it, the application has now big performance issues (more than 10 seconds to load a dozen rows, same to insert new rows,...).

  • Should I stay with entity framework but try to improve performance by altering the database adding views and stored procedure ?
  • Should I get rid off entity framework and use only "basic" code (and improve the database at the same time) ?
  • Is there a simple ORM I could use instead of EF ?

Time is not an issue here, I can use all the time I want to improve the application but I can't seem to make a decision about the best way to make my application evolved.

The database is quite simple (around 10 tables), the only thing that could complicates thing is that I store files in there. So I'm not sure I can really use whatever I want. And I don't know if it's important but I need to display quite a few calculated fields. Any advice ?

Feel free to ask any relevant questions.

LosOjos
  • 1
  • 2
  • Have you pinpointed the cause for the poor performance? I'm having a hard time believing that the Database is the cause (Databases are insanely fast, even for tens of thousands of results from a query). How large is the data you're reading? If it's too large it might be a network bandwidth problem or the problem might lay with processing it in your application that takes a long time – MindSwipe Jan 21 '19 at 14:18
  • Fix the server, I doubt you can code network / server performance issues away anyway – JFM Jan 21 '19 at 14:18
  • If you alredy know that your db is the problem, why do you think *any* other ORM will perform any better? Instead of changing those layers that scale well, you should change the layers that **cause** the problems. – MakePeaceGreatAgain Jan 21 '19 at 14:18
  • @MindSwipe There are tons of reasons why even the fastest db may have bad performance. That dbs are fast per se is simply not true. – MakePeaceGreatAgain Jan 21 '19 at 14:20
  • Clearly the connection to the server is the problem because the behavior changed when the server changed, however as you can imagine I can't touch it, it is not something in my control. – LosOjos Jan 21 '19 at 14:22
  • @HimBromBeere And to precise my question, of course I know that my database has issues and I want to improve it no matter what other choice I make. I would like advice on how to make sure I do everything I can to improve the performance (including the change on the database) – LosOjos Jan 21 '19 at 14:23
  • Even if you could get a few nano-seconds by changing the ORM or heavily change your applications logic, your actual bottlenecks aren´t solved. – MakePeaceGreatAgain Jan 21 '19 at 14:24
  • @HimBromBeere database servers are designed to be as fast as possible, and SQL is insanely fast. You have to do a lot of things seriously wrong for EF to slow down to a crawl. [Here](https://exceptionnotfound.net/dapper-vs-entity-framework-vs-ado-net-performance-benchmarking/) is a link to someone testing and comparing ADO.NET Dapper.NET and EF on around 24'000 datasets. As you can see, even the most time consuming queries are faster than a second – MindSwipe Jan 21 '19 at 14:24
  • We can't help all too much without seeing your Database (a diagram would be enough) and the queries your performing that take so long. Mind posting them? – MindSwipe Jan 21 '19 at 14:26
  • @MindSwipe Saying "dbs are fast" is as whrong as saying "C++ is fast". It allways depends on what you´re doing with it. Anyway that seems off-topic for this question. – MakePeaceGreatAgain Jan 21 '19 at 14:26
  • Can you elaborate on the difference between the internal server and the new server? As you say I am sure you could improve db performance, but it seems a little unlikely that it is the main issue as it worked before. – ste-fu Jan 21 '19 at 14:28
  • It might be worth noting that, with EF you have something called a "Cold Query" this is the first query you perform on your Database every time you restart your application, it does setup and such. This Cold Query can take what feels like forever (mine once took 4 seconds to retrieve 0 datasets, which did asynchronously on startup to alleviate the problem). So is it only your first query and the rest are faster or is it every query? – MindSwipe Jan 21 '19 at 14:33
  • @MindSwipe Here is the schema of the database like it was few days ago (pretty much the same now) https://ibb.co/B2JJ3nj – LosOjos Jan 21 '19 at 14:40
  • @ste-fu The main difference is that the first one was a local server and now it's a remote one – LosOjos Jan 21 '19 at 14:41
  • @MindSwipe It is not that slow to access the data, the problem is the worst when it's about updating or deleting rows (to answer your question, it's not just one query) – LosOjos Jan 21 '19 at 14:42
  • Two things; 1: How (physically) far away is the new server? Is it in the same network? What's the connection speed between you and it? (If it's really slow and you're loading kilobytes of not megabytes of information that might be the problem) and 2: I see quite a lot of problems with your database schema, like Message having 2 foreign keys to product and so on. I'd recommend reading about database normalization and implementing best-practices – MindSwipe Jan 21 '19 at 14:48
  • @MindSwipe Unfortunatly I don't have much information about the new server, how could I figure out the connection speed between me and it ? – LosOjos Jan 21 '19 at 14:58
  • @MindSwipe And for the database, the screenshot was not well taken, there is only one foreign key between Message and Product, the other one was between Product and Employee, do you spot other big problems ? – LosOjos Jan 21 '19 at 15:00
  • Are you working in an office environment? If so ask your IT guy (or the guy who set up server), about connection speed and such. If not are you hosting your database in the cloud? And I see general reference problems in the DB schema, Template is referencing Product and Product is referencing Ressource, yet Template is also referencing Ressource? – MindSwipe Jan 21 '19 at 15:05
  • Local as in running on the users' machine? Local as in the server room in your office? – ste-fu Jan 21 '19 at 15:15
  • I'm gonna try to get more information on the server if I can, thanks for trying anyway ! :) For the references in the DB, it's "normal", "Template.product_id" references "Product.product_id", "Template.header_id" references "Ressource.ressource_id", "Template.footer_id" references "Ressource.ressource_id" (a different one) and "Product.logo_id" references "Ressource.ressource_id" (a different one again) – LosOjos Jan 21 '19 at 15:16
  • @ste-fu local as running in a room near me (server in the internal company network, I don't know if it would be the best translation, is it clear ?) – LosOjos Jan 21 '19 at 15:18
  • That makes sense, and the new server is remote as in not in the same building as you? – ste-fu Jan 21 '19 at 15:20
  • Yes exactly, it's a server which is not only accessible by people physically in the company offices but accessible from the outside (clients) too. – LosOjos Jan 21 '19 at 15:22
  • So it looks like the problem is data transfer. And the only thing you can do about that is transferring less data – MindSwipe Jan 22 '19 at 06:34

3 Answers3

1

For performance profiling, the first place I recommend looking is an SQL profiler. This can capture the exact SQL statements that EF is running, and help identify possible performance culprits. I cover a few of these here. The Schema issues are probably the most relevant place to start. The title targets MVC, but most of the items relate to WPF and any application.

A good, simple profiler that I use for SQL Server is ExpressProfiler. (https://github.com/OleksiiKovalov/expressprofiler)

With the move to a new server, and it now sending the data over the wire rather than pulling from a local database, the performance issues you're noticing will most likely be falling under the category of "loading too much, too often". Now you won't only be waiting for the database to load the data, but also for it to package it up and send it over the wire. Also, does the new database represent the same data volume and serve only a single client, or now serving multiple clients? Other catches for developers is "works on my machine" where local testing databases are smaller and not dealing with concurrent queries from the server. (where locks and such can impact performance)

From here, run a copy of the application with an isolated database server (no other clients hitting it to reduce "noise") with the profiler running against it. The things to look out for:

Lazy Loading - This is cases where you have queries to load data, but then see lots (dozens to hundreds) of additional queries being spun off. Your code may say "run this query and populate this data" which you expect should be 1 SQL query, but by touching lazy-loaded properties, this can spin off a great many other queries. The solution to lazy loading: If you need the extra data, eager load it with .Include(). If you only need some of the data, look into using .Select() to select view models / DTO of the data you need rather than relying on complete entities. This will eliminate lazy load scenarios, but may require some significant changes to your code to work with view models/dtos. Tools like Automapper can help greatly here. Read up on .ProjectTo() to see how Automapper can work with IQueryable to eliminate lazy load hits.

Reading too much - Loading entities can be expensive, especially if you don't need all of that data. Culprits for performance include excessive use of .ToList() which will materialize entire entity sets where a subset of data is needed, or a simple exists check or count would suffice. For example, I've seen code that does stuff like this:

var data = context.MyObjects.SingleOrDefault(x => x.IsActive && x.Id = someId);
return (data != null);

This should be:

var isData = context.MyObjects.Where(x => x.IsActive && x.Id = someId).Any();
return isData;

The difference between the two is that in the first example, EF will effectively do a SELECT * operation, so in the case where data is present it will return back all columns into an entity, only to later check if the entity was present. The second statement will run a faster query to simply return back whether a row exists or not.

var myDtos = context.MoyObjects.Where(x => x.IsActive && x.ParentId == parentId)
  .ToList()
  .Select( x => new ObjectDto
  {
    Id = x.Id,
    Name = x.FirstName + " " + x.LastName,
    Balance = calculateBalance(x.OrderItems.ToList()),
    Children = x.Children.ToList()
      .Select( c => new ChildDto  
      {
        Id = c.Id,
        Name = c.Name
      }).ToList()
  }).ToList();

Statements like this can go on and get rather complex, but the real problems is the .ToList() before the .Select(). Often these creep in because devs try to do something that EF doesn't understand, like call a method. (i.e. calculateBalance()) and it "works" by first calling .ToList(). The problem here is that you are materializing the entire entity at that point and switching to Linq2Object. This means that any "touches" on related data, such as .Children will now trigger lazy loads, and again further .ToList() calls can saturate more data to memory which might otherwise be reduced in a query. The culprit to look out for is .ToList() calls and to try removing them. Select simpler values before calling .ToList() and then feed that data into view models where the view models can calculate resulting data.

The worst culprit like this I've seen was due to a developer wanting to use a function in a Where clause:

var data = context.MyObjects.ToList().Where(x => calculateBalance(x) > 0).ToList();

That first ToList() statement will attempt to saturate the whole table to entities in memory. A big performance impact beyond just the time/memory/bandwidth needed to load all of this data is simply the # of locks the database must make to reliably read/write data. The fewer rows you "touch" and the shorter you touch them, the nicer your queries will play with concurrent operations from multiple clients. These problems magnify greatly as systems transition to being used by more users.

Provided you've eliminated extra lazy loads and unnecessary queries, the next thing to look at is query performance. For operations that seem slow, copy the SQL statement out of the profiler and run that in the database while reviewing the execution plan. This can provide hints about indexes you can add to speed up queries. Again, using .Select() can greatly increase query performance by using indexes more efficiently and reducing the amount of data the server needs to pull back.

For file storage: Are these stored as columns in a relevant table, or in a separate table that is linked to the relevant record? What I mean by this, is if you have an Invoice record, and also have a copy of an invoice file saved in the database, is it:

Invoices

  • InvoiceId
  • InvoiceNumber
  • ...
  • InvoiceFileData

or

Invoices

  • InvoiceId
  • InvoiceNumber
  • ...

InvoiceFile

  • InvoiceId
  • InvoiceFileData

It is a better structure to keep large, seldom used data in separate tables rather than combined with commonly used data. This keeps queries to load entities small and fast, where that expensive data can be pulled up on-demand when needed.

If you are using GUIDs for keys (as opposed to ints/longs) are you leveraging newsequentialid()? (assuming SQL Server) Keys set to use newid() or in code, Guid.New() will lead to index fragmentation and poor performance. If you populate the IDs via database defaults, switch them over to use newsequentialid() to help reduce the fragmentation. If you populate IDs via code, have a look at writing a Guid generator that mimics newsequentialid() (SQL Server) or pattern suited to your database. SQL Server vs. Oracle store/index GUID values differently so having the "static-like" part of the UUID bytes in the higher order vs. lower order bytes of the data will aid indexing performance. Also consider index maintenance and other database maintenance jobs to help keep the database server running efficiently.

When it comes to index tuning, database server reports are your friends. After you've eliminated most, or at least some serious performance offenders from your code, the next thing is to look at real-world use of your system. The best thing here to learn where to target your code/index investigations are the most used and problem queries that the database server identifies. Where these are EF queries, you can usually reverse-engineer based on the tables being hit which EF query is responsible. Grab these queries and feed them through the execution plan to see if there is an index that might help matters. Indexing is something that developers either forget, or get prematurely concerned about. Too many indexes can be just as bad as too few. I find it's best to monitor real-world usage before deciding on what indexes to add.

This should hopefully give you a start on things to look for and kick the speed of that system up a notch. :)

Steve Py
  • 26,149
  • 3
  • 25
  • 43
0

First you need to run a performance profiler and find put what is the bottle neck here, it can be database, entity framework configuration, entity framework queries and so on

In my experience, entity framework is a good option to this kind of applications, but you need understand how it works.

Also, What entity framework are you using? the lastest version is 6.2 and has some performance improvements that olders does not have, so if you are using a old one i suggest that update it

  • I've already ran a performance profiler but to be honest it's quite hard to analyse it if it's the first time you use it, I don't know where to look ! – LosOjos Jan 21 '19 at 15:33
  • Yes, you should update. I suggest that learn more about performance profilers, there are memory profiler, cpu profiler an so on. You need to check what process inside your code is taking a long time to execute. Start with this video: https://www.youtube.com/watch?v=LhhGqNAGQHU – Alex Edwin Velez Jan 21 '19 at 15:40
0

Based on the comments I am going to hazard a guess that it is mostly a bandwidth issue.

You had an application that was working fine when it was co-located, perhaps a single switch, gigabit ethernet and 200m of cabling.

Now that application is trying to send or retrieve data to/from a remote server, probably over the public internet through an unknown number of internal proxies in contention with who knows what other traffic, and it doesn't perform as well.

You also mention that you store files in the database, and your schema has fields like Attachment.data and Doc.file_content. This suggests that you could be trying to transmit large quantities (perhaps megabytes) of data for a simple query and that is where you are falling down.

Some general pointers:

  • Add indexes for anywhere you are joining tables or values you commonly query on.
  • Be aware of the difference between Lazy & Eager loading in Entity Framework. There is no right or wrong answer, but you should be know what you approach you are using and why.
  • Split any file content into its own table, with the same primary key as the main table or play with different EF classes to make sure you only retrieve files when you need to use them.
ste-fu
  • 6,879
  • 3
  • 27
  • 46