Reading thousands of objects with EF Core FAST

Question

I am reading 40,000 small objects / rows from SQLite with EF core, and it's taking 18 seconds, which is too long for my UWP app. When this happens CPU usage on a single core reaches 100%, but the disk reading speed is circa 1%.

var dataPoints =  _db.DataPoints.AsNoTracking().ToArray();

Without AsNoTracking() the time taken is even longer.

DataPoint is a small POCO with a few primitive properties. Total amount of data I am loading is 4.5 MB.

    public class DataPointDto
    {
        [Key]
        public ulong Id { get; set; }

        [Required]
        public DateTimeOffset TimeStamp { get; set; }

        [Required]
        public bool trueTime { get; set; }

        [Required]
        public double Value { get; set; }
   }

Question: Is there a better way of loading this many objects, or am I stuck with this level of performance?

Fun fact: x86 takes 11 seconds, x64 takes 18. 'Optimise code' shaves off a second. Using Async pushes execution time to 30 seconds.

Do you need to retrieve all objects at once? or can you do lazy loading? — JanR, Mar 07 '16 at 22:00
Do you have an index on your table? That could speed things up? — JanR, Mar 07 '16 at 22:26
You have to think of a different approach for this. As far as I know SQLLite does not support stored procs. But you can use CLR it seems. (http://stackoverflow.com/questions/172735/create-use-user-defined-functions-in-system-data-sqlite). Use a CLR function to output only the most essential datapoints to create your graph. — Kosala W, Mar 07 '16 at 22:30
I am simply loading the entire table into memory. I don't see how an index or stored procedure would help. Am i missing something? Would making fewer records that are each larger improve performance? — Vladimir Akopyan, Mar 07 '16 at 23:00
I realize you are using EF and not looking for a different data access method, but this may be a reason to consider something faster like a micro-ORM such as PetaPOCO. — , Mar 08 '16 at 00:13
I would look into it, can you reccomend anything that works in UWP? — Vladimir Akopyan, Mar 08 '16 at 00:15
Maybe you could look at a nosql db option and see if speed is better with that? Like FileDB https://www.nuget.org/packages/FileDb.NET/ or any other... — Depechie, Mar 08 '16 at 12:55
This is weird. Can you try using a different, lighter ORM just to compare the speeds? (https://github.com/oysteinkrog/SQLite.Net-PCL) — Igor Ralic, Mar 08 '16 at 13:26
Have you done any profiling to see where your app is spending the majority of the time during the 11-18 seconds? — WiredPrairie, Mar 08 '16 at 19:29
Thanks for the suggestions, i will look into micro-Orm and nosql Db's — Vladimir Akopyan, Mar 27 '16 at 18:10

score 9 · Accepted Answer · answered Mar 27 '16 at 18:07

Most answers follow the common wisdom of loading less data, but in some circumstances such as here you Absolutely Positively Must load a lot of entities. So how do we do that?

Cause of poor performance

Is it unavoidable for this operation to take this long? Well, its not. We are loading just a megabyte of data from disk, the cause of poor performance is that the data is split across 40,000 tiny entities. The database can handle that, but the entity framework seem to struggle setting up all those entities, change tracking, etc. If we do not intend to modify the data, there is a lot we can do.

I tried three things

Primitives

Load just one property, then you get a list of primitives.

List<double> dataPoints =  _db.DataPoints.Select(dp => dp.Value).ToList();

This bypasses all of entity creation normally performed by entity framework. This query took 0.4 seconds, compared to 18 seconds for the original query. We are talking 45 (!) times improvement.

Anonymous types

Of-course most of the time we need more than just an array of primitives We can create new objects right inside the LINQ query. Entity framework won't create the entities it normally would, and the operation runs much faster. We can use anonymous objects for convenience.

var query = db.DataPoints.Select(dp => new {Guid ID = dp.sensorID, DateTimeOffset Timestamp = dp.TimeStamp, double Value = dp.Value});

This operations takes 1.2 seconds compared to 18 seconds for normally retrieving the same amount of data.

Tuples

I found that in my case using Tuples instead of anonymous types improves performance a little, the following query executed roughly 30% faster:

var query = db.DataPoints.Select(dp => Tuple.Create(dp.sensorID, dp.TimeStamp, dp.Value));

Other ways

You cannot use structures inside LinQ queries, so that's not an option
In many cases you can combine many records together to reduce overhead associated with retrieving many individual records. By retrieving fewer larger records you could improve performance. For instance in my usecase I've got some measurements that are being taken every 5 minutes, 24/7. At the moment I am storing them individually, and that's silly. Nobody will ever query less than a day worth of them. I plan to update this post when I make the change and find out how performance changed.
Some recommend using an object oriented DB or micro ORM. I have never used either, so I can't comment.

score 2 · Answer 2 · answered Mar 08 '16 at 16:02

2

you can use a different technique to load all your items.

you can create your own logic to load parts of the data while the user is scrolling the ListView( I guess you are using it) .

fortunately UWP a easy way to do this technique. Incremental loading please see the documentation and example

https://msdn.microsoft.com/library/windows/apps/Hh701916

answered Mar 08 '16 at 16:02

RicardoPons

1,323
9
14

Nice technique. Thanks for the tip. – Vladimir Akopyan Mar 27 '16 at 18:11

score 1 · Answer 3 · answered Jun 29 '20 at 11:30

1

Performance test on 26 million records (1 datetime, 1 double, 1 int), EF Core 3.1.5:

Anonymous types or tuples as suggested in the accepted answer = About 20 sec, 1.3GB RAM
Struct = About 15 sec, 0.8GB RAM

answered Jun 29 '20 at 11:30

BobbyTables

4,481
1
31
39