EF Core version: 3.1.
Here's my method:
public static ILookup<string, int> GetClientCountLookup(DepotContext context, DateRange dateRange)
=> context
.Flows
.Where(e => e.TimeCreated >= dateRange.Start.Date && e.TimeCreated <= dateRange.End.Date)
.GroupBy(e => e.Customer)
.Select(g => new { g.Key, Count = g.Count() })
.ToLookup(k => k.Key, e => e.Count);
All fields used are indexed.
Here's generated query:
SELECT [f].[Customer] AS [Key], COUNT(*) AS [Count]
FROM [Flows] AS [f]
WHERE ([f].[TimeCreated] >= @__dateRange_Start_Date_0) AND ([f].[TimeCreated] <= @__dateRange_End_Date_1)
GROUP BY [f].[Customer]
When that query is executed as SQL, the execution time is 100ms.
When that query is used in code with ToLookup
method - execution time is 3200ms.
What's even more weird - the execution time in EF Core seems totally independent from the data sample size (let's say, depending on date range we can count hundreds, or hundreds of thousands records).
WHAT THE HECK IS HAPPENING HERE?
The query I pasted is the real query EF Core sends. The code fragment I pasted first is executed in 3200ms. Then I took exact generated SQL and executed in as SQL query in Visual Studio - took 100ms.
It doesn't make any sense to me. I use EF Core for a long time and it seem to perform reasonably. Most queries (plain, simple, without date ranges) are fast, results are fetched immediately (in less than 200ms).
In my application I built a really HUGE query with like 4 multi-column joins and subqueries... Guess what - it fetches 400 rows in 3200ms. It also fetches 4000 rows in 3200ms. And also when I remove most of the joins, includes, even remove the subquery - 3200ms. Or 4000, depending on my Internet or server momentary state and load.
It's like constant lag and I pinpointed it to the exact first query I pasted.
I know ToLookup
method causes to finally fetch all input expression results, but in my case (real world data) - there are exactly 5 rows.
The results looks like this:
|------------|-------|
| Key | Count |
|------------|-------|
| Customer 1 | 500 |
| Customer 2 | 50 |
| Customer 3 | 10 |
| Customer 4 | 5 |
| Customer 5 | 1 |
Fetching 5 rows from database takes 4 seconds?! It's ridiculous. If the whole table was fetched, then rows grouped and counted - that would add up. But the generated query returns literally 5 rows.
What is happening here and what am I missing?
Please, DO NOT ASK ME TO PROVIDE THE FULL CODE. It is confidential, part of a project for my client, I am not allowed to disclose my client's trade secrets. Not here nor in any other question. I know it's hard to understand what happens when you don't have my database and the whole application, but the question here is pure theoretical. Either you know what's going on, or you don't. As simple as that. The question is very hard though.
I can only tell the RDBMS used is MS SQL Express running on Ubuntu server, remotely. The times measured are the times of executing either code tests (NUnit) or queries against the remote DB, all performed on my AMD Ryzen 7 8 core 3.40GHz processor. The server lives on Azure, like 2 core of I5 2.4GHz or something like that.