Most answers follow the common wisdom of loading less data, but in some circumstances such as here you Absolutely Positively Must load a lot of entities. So how do we do that?
Cause of poor performance
Is it unavoidable for this operation to take this long?
Well, its not. We are loading just a megabyte of data from disk, the cause of poor performance is that the data is split across 40,000 tiny entities. The database can handle that, but the entity framework seem to struggle setting up all those entities, change tracking, etc. If we do not intend to modify the data, there is a lot we can do.
I tried three things
Primitives
Load just one property, then you get a list of primitives.
List<double> dataPoints = _db.DataPoints.Select(dp => dp.Value).ToList();
This bypasses all of entity creation normally performed by entity framework. This query took 0.4 seconds, compared to 18 seconds for the original query. We are talking 45 (!) times improvement.
Anonymous types
Of-course most of the time we need more than just an array of primitives
We can create new objects right inside the LINQ query. Entity framework won't create the entities it normally would, and the operation runs much faster. We can use anonymous objects for convenience.
var query = db.DataPoints.Select(dp => new {Guid ID = dp.sensorID, DateTimeOffset Timestamp = dp.TimeStamp, double Value = dp.Value});
This operations takes 1.2 seconds compared to 18 seconds for normally retrieving the same amount of data.
Tuples
I found that in my case using Tuples instead of anonymous types improves performance a little, the following query executed roughly 30% faster:
var query = db.DataPoints.Select(dp => Tuple.Create(dp.sensorID, dp.TimeStamp, dp.Value));
Other ways
- You cannot use structures inside LinQ queries, so that's not an
option
- In many cases you can combine many records together to reduce
overhead associated with retrieving many individual records. By
retrieving fewer larger records you could improve performance. For
instance in my usecase I've got some measurements that are being
taken every 5 minutes, 24/7. At the moment I am storing them
individually, and that's silly. Nobody will ever query less than a
day worth of them. I plan to update this post when I make the change
and find out how performance changed.
- Some recommend using an object oriented DB or micro ORM. I have
never used either, so I can't comment.