1

This question is sort of a sequel to that question.

When we want to build a WCF service which works with some kind of data, it's natural that we want it to be fast and efficient. In order to achieve that, we have to make sure all segments of data road trip work as fast as they could, from data storage back-end such as SQL Server, to a WCF client who requested that data.

While seeking for an answer on that previous question, we have learned, thanks to Slauma and others who contributed through comments, that the time consuming part of Entity Framework's (first) large query is object materialization and attaching entities to the context when the result from the database is returned. We have seen that everything works much faster on subsequent queries.

Assuming those large queries are used as read-only operations, we came to a conclusion that we could set EF MergeOption to NoTracking, yielding better first query performance. What we have done with NoTracking was telling EF to create separate object for each record retrieved from the database - even when they have the same key. This will cause additional processing if we have .Include() statement in our query, which will lead to data with much larger size being returned.

The data may be so big that we could easily ask ourselves - did we really help our cause by using NoTracking option, even if we made the query faster (and maybe only the first one, depending on the number of .Include() statements, because subsequent queries without NoTracking option with multiple .Include() statements run faster simply because NoTracking option causes a lot more objects to be created when data returns from the server)?

The biggest problem is how to efficiently serialize this amount of data - and deserialize it on the client. With serialization already as slow as it is (I am using DataContractSerializer with PreserveObjectReferences set to true because I am sending EF 4.x generated POCOs to my client and vice versa), do we want to generate even more data (thanks to NoTracking)? To be honest, I haven't seen the data originated from the query with NoTracking option on ~11.000 objects not including navigation properties obtained via .Include(), arriving at the client side yet. Last time I tried to pull this off, the timeout of 00:10:00 was triggered (!)

So if you are still reading this wall of text, you tell me how to solve this situation. Which serializer to use in order to achieve acceptable results? Currently, if I don't use the NoTracking option, the serialization, transport and deserialization of ~11.000, via wsHttpBinding-like custom binding on the local machine take ~5 seconds. What's scary to me is that this large table is most likely going to contain ~500.000 records eventually.

Community
  • 1
  • 1
Dejan Janjušević
  • 3,181
  • 4
  • 41
  • 67

3 Answers3

3

Have you considered creating a View Model for your object and doing a projection in the select statement. That should be a lot faster so:

 var result = from person in DB.Entities.Persons
    .Include("District")
    .Include("District.City")
    .Include("District.City.State")
    .Include("Nationality")
    select new PersonViewModel()
    {
        Name = person.Name, 
        City = person.District.City, 
        State = person.District.City.State
        Nationality = person.Nationality.Name
    };

This would require you to create a ViewModel class to hold the flattened data for the PersonViewModel.

You might be able to further speed up things by creating a database view and letting Entity Framework select directly from there.

If you rally want the front-end to populate a grid with 500.000 records, then I'd remove the webservice layer altogether and use a DataReader to speed up the process. Entity Framework and WCF aren't suitable for transforming the data at a proper performance. What you're basically doing here is:

Database -> TDS -> .NET objects -> XML -> Plain text -> XML -> .NET Objects -> UI

While this could easily be reduced to:

Database -> TDS -> UI

Then use EntityFramwork to handle the changes to the entities in your business logic. This is in line with the Command and Query Separation pattern. Use a technology suitable for high performance querying of data and link that directly to your app. Then use a command strategy to implement your business logic.

OData services might also provide a better way to link your UI directly to the data, as it can be used to quickly query your data allowing you to implement quick filtering without the user really noticing.

If the security settings are prohibiting direct querying through OData or direct access to the SQL database, consider materializing the objects yourself. Select the data directly from either a view or a query and use a IDataReader to directly populate your ViewModel. That will probably give you the highest performance.

There are a lot of alternatives to Entity Framework created especially because EF isn't cut out for large datasets. See FluentData DapperDotNet, Massive or PetaPoco. You might want to use these side-by-side with entity Framework to handle your large, flat data queries.

jessehouwing
  • 106,458
  • 22
  • 256
  • 341
  • I think I should definitely use ViewModels. My current design is more leaned towards better end-user experience. The DGV contains the entire result set including navigation property objects, so when the user clicks "edit" an edit for calls ParentForm.GetCurrentObject() and binds it to controls. However now I will consider using ViewModels for DGV and calling Proxy.GetObjectByID() from a BackgroundWorker on edit form. Thanks for the tip. – Dejan Janjušević Apr 05 '12 at 06:55
  • About OData or direct access to the SQL database, that's a no go. I prefer "regular" WCF service because of many security features available. About your suggestions to use FluentData, Dapper, Massive or PetaPoco, let me tell you, my joy of using EntityFramework is in the ability to LINQ into entities directly while they are still on SQL server. These other solutions require me to use ugly-looking (to my taste) SQL or SQL-like syntax in my code. – Dejan Janjušević Apr 05 '12 at 06:59
  • I understand your issues with going away from the good looking EF Linq syntax. But if you want to have better performance and using a ViewModel/Projection isn't helping enough, then there isn't much you can do except maybe implement `Equals` and `HashCode` on all your entity classes and hope that the serializer will pick this up. – jessehouwing Apr 05 '12 at 10:13
0

I use Json.Net's implementation of Bson in my RIA application. More info here.

I yield return an IEnumerable, as I read from the database and serialize the rows. I find the speed to be acceptable and I return Entities with roughly 20 properties. This approach should minimize the concurrent memory use on the server.

Doguhan Uluca
  • 6,933
  • 4
  • 38
  • 51
  • Does your solution require me to add some special attributes to my POCO classes, or can I just plug it in and expect it to work? I use EF generated POCO entities so I can't modify those definitions - every change I make is lost when regenerating the model. – Dejan Janjušević Apr 05 '12 at 07:05
  • 1
    You could change the T4 templates to generate different code. And you could stick the modifications into a partial class if needed. – jessehouwing Apr 05 '12 at 10:13
  • Good comment, i have now noticed that T4 template generates partial classes. How does this Json.Net's serializer performance compare to [protobuf-net](http://code.google.com/p/protobuf-net/) ? – Dejan Janjušević Apr 05 '12 at 10:42
  • In my case, I'm dealing with a generic control that can simply display any given entity in a DataGrid, so the result can be printed or exported to a file. I haven't made any changes to entities, because they're being serialized and deserialized by Json.Net Bson, which results in a IEnumerable and RIA supports returning this type to the client. But with 'yield return' I literally return to the client row by row and that is a COOL thing to be doing. I'm haven't tested protobuf, but would be interested in the results. – Doguhan Uluca Apr 05 '12 at 13:53
  • Actually technically I'm returning a POCO Entity object, that contains IEnumerable. – Doguhan Uluca Apr 05 '12 at 14:10
  • @duluca here is the page with performance benchmarks of various serializers: http://www.servicestack.net/benchmarks/ – Dejan Janjušević Apr 06 '12 at 07:46
0

Based on what I have gathered by looking at various reviews and performance benchmarks, I would choose protobuf-net as a serializer. It's just a matter of design whether it can be plugged into my service configuration. More info about that here.

Although not completely an answer to this question, jessehouwing had the best answer and I am marking it as accepted.

Community
  • 1
  • 1
Dejan Janjušević
  • 3,181
  • 4
  • 41
  • 67