0

I have a database with three tables and I am making a program that will select and need to display results from all three at the same time.

In order to speed up the getting of records I decided to make the requests async so that they can happen at the same time.
Since you cannot perform more than one concurrent operation on the same DB context, I just instantiate three contexts, in total, and start a Linq query "ToListAsync()" on each one.

But something weird (for me) happens - the async version wasn't any faster than the sync version. Whether I get the results asynchroniously with ".ToListAsync()" or synchronously with ".ToList()" they take the same amount of time. With the sync version, the first operation takes ~2s to complete while the second ~170ms and the third ~90ms. I suppose some sort of execution plan or caching or establishing the connection is what makes the first request take a lot of time while the others complete almost instanteniously but I'd like to know WHY exactly this happens and if there is a way to make it faster.
Forgot to mention the async version takes the same amount of time - 2s.

Relevant code:

        int maxRecords = 10;
        var resultsTableOne = dbContext.tableOne.Where(log =>
                                            log.DateUtc >= dateFrom || dateFrom == null)
                                        && (log.DateUtc <= dateTo || dateTo == null))
                                            .OrderByDescending(log => log.Id).Take(maxRecords).ToList();
        var resultsTableTwo = dbContext.tableTwo.Where(log =>
                                            log.DateUtc >= dateFrom || dateFrom == null)
                                        && (log.DateUtc <= dateTo || dateTo == null))
                                            .OrderByDescending(log => log.Id).Take(maxRecords).ToList();
        var resultsTableThree = dbContext.tableThree.Where(log =>
                                            log.DateUtc >= dateFrom || dateFrom == null)
                                        && (log.DateUtc <= dateTo || dateTo == null))
                                            .OrderByDescending(log => log.Id).Take(maxRecords).ToList();

The select conditions are simplified here. For the async version, the "ToList" is replaced with "ToListAsync", which returns a task on which "GetAwaiter().GetResult()" is called after the code shown here (and the "dbContext" is replaced with three separate instances).

P.S.
The results for the sync version are the same with one and three DbContexts. I.e. it takes the same amount of time whether all three operations are performed on the same instance or each on its own instance.

user14092802
  • 105
  • 10
  • Can you share your code? – smoksnes Mar 22 '21 at 08:30
  • @smoksnes, oh, yeah, sorry - it's edited, now. – user14092802 Mar 22 '21 at 08:59
  • How is `dbContext` instantiated? If you get it from a DI framework of some sort, there's a good chance it's just one instance, not three distinct copies, even if you're assigning it to three different identifiers. – Tieson T. Mar 22 '21 at 09:08
  • `Since you cannot perform more than one concurrent operation on the same DB context,` because a DbContext is **not a connection**, it's a Unit-of-Work and so it's meant to be short-lived and only contain the changes needed for a single business transaction. If `dbContext` is long-lived, you have a bug. That's why `AddDbContext` registers DbContexts as scoped instances that are disposed as soon as the scope (eg request) ends. That's why all examples show `DbContext` instantiated in a `using` block – Panagiotis Kanavos Mar 22 '21 at 09:08
  • 1
    `In order to speed up the getting of records` to do that, fix the queries and indexes. Concurrent execution won't make make the server's disks or CPUs run faster, or the network cards double their bandwidth. *Why is the query slow*? Executing a bad query multiple times makes it *slower* not faster. If the query doesn't have the proper indexes, if long-lived transactions and connections block it, if the lack of indexes causes table-level locks, concurrent queries will only *increase* blocking and IO delays – Panagiotis Kanavos Mar 22 '21 at 09:10
  • Probably also relevant: https://stackoverflow.com/questions/12337671/using-async-await-for-multiple-tasks – Tieson T. Mar 22 '21 at 09:13
  • 1
    Is `dateUtc` indexed? Is `Id`? If not, each query ends up scanning (and locking) the entire table to find matching rows.`log.DateUtc >= dateFrom || dateFrom == null` ??? That's a bug, resulting in bad execution plans. If you don't want to use the `dateFrom` filter, just omit don't use that `Where` clause. That thing alone can force all queries to scan the entire table instead of using any indexes on `DateUtc`. The reason with such catch-all queries is [explained here](https://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/) – Panagiotis Kanavos Mar 22 '21 at 09:14
  • You can use `if (dateFrom!=null){ query=query.Where(log.DateUtc >= dateFrom );}` to add a filter only if needed and avoid the catch-all problems – Panagiotis Kanavos Mar 22 '21 at 09:16
  • `the async version wasn't any faster than the sync version` because concurrency was never the problem. The problems remain. `With the sync version, the first operation takes ~2s to complete while the second ~170ms and the third ~90ms.` the *database server* caches data in RAM, along with the relevant index pages. Which is why loading all data on the client for processing typically results in *bad* performance. – Panagiotis Kanavos Mar 22 '21 at 09:18
  • @TiesonT., sorry, didn't think to add that part, I manually create instances a la "var context1 = new DbContext". – user14092802 Mar 22 '21 at 09:28
  • @PanagiotisKanavos, oh, I see, thank you I learned a lot from your comments! The table doesn't have any indexes and I can't change that, the reason for the "log.DateUtc >= dateFrom || dateFrom == null" is that the user has several input boxes for different filters, including "date from" and "date to" but the user is not always required to use all filters so I was led to believe my approach would omit the clause from the sql query if it is null, as LINQ would be smart enough. – user14092802 Mar 22 '21 at 09:32
  • @PanagiotisKanavos, I understand the physical limitations of the server, but since the data stored in those tables isn't big at all (in fact it's pretty tiny), wouldn't performing a select against each table, separately at the same time speed it up? – user14092802 Mar 22 '21 at 09:33
  • "wouldn't performing a select against each table, separately at the same time speed it up" - not really. If you're actually using `GetAwaiter().GetResult()` in your code, you've essentially forced everything back into synchronous mode (to vastly oversimplify). If the three tables all have the same structure (or are transformed to be the same, in code) a view probable makes more sense - you can union that data in one database call. Indexes _do_ make a pretty significant difference, though, so I'd look into whether those can be added. – Tieson T. Mar 22 '21 at 09:46
  • You might also want to look into LINQKit, if your query predicates need to be (more or less) dynamic. Specifically, PredicateBuilder: https://github.com/scottksmith95/LINQKit#predicatebuilder – Tieson T. Mar 22 '21 at 09:48
  • And: if this data isn't being modified and saved, add `.AsNoTracking()` to the LINQ query, which does improve EF's performance: https://learn.microsoft.com/en-us/ef/core/querying/tracking (that's for EF Core, but the behavior is pretty much the same in EF6) – Tieson T. Mar 22 '21 at 09:53
  • @user14092802 this has nothing to do with server limitations. It's a program and query bug. You won't fix the bug by running it more often, you have to identify and fix the cause. If the data is so tiny, why is it slow? Did you check what's wrong with the query? The execution plan? – Panagiotis Kanavos Mar 22 '21 at 10:15
  • 1
    @user14092802 if you can't use indexes, you can't speed anything up. End of story. The first query will always be slower because it scans the entire table. Subsequent queries will always be faster because the data is cached *until the server clears the cache*. That's a problem with the query and the database design, not the server. Concurrent execution won't make the fundamentally slow query go faster – Panagiotis Kanavos Mar 22 '21 at 10:25
  • @TiesonT., I call `GetAwaiter().GetResult()` after the tasks are started, since I want to return the three result sets at the same time, my thought process was - start the tasks so that the first would have some time to run while the others are started and then wait on it. True, it's not THAT much better but it's the best I could come up that fits the "the three result sets must be returned together at the same time" possibly in the shortest time. I'm also using EF core, sorry for not making it clear. – user14092802 Mar 22 '21 at 10:28
  • @user14092802 ON THE OTHER HAND, the ID is always indexed and I assume, incrementing, so it's *cheap* to request all records since the last ID you've read. If you want to retrieve new log entries, this would be the cheapest way – Panagiotis Kanavos Mar 22 '21 at 10:28
  • @user14092802 no, `GetAwaiter().GetResult()` doesn't start anything, it *awaits* for an already executing task to complete. Tasks aren't threads, they're *promises* that something will produce a result in the future. The query starts executing when you wall `ToListAsync`, not when you block on the already started tasks. You can avoid blocking like that by using `await Task.WhenAll()`. That won't fix the actual problem with the table though. You want fast performance? Use indexes. Period – Panagiotis Kanavos Mar 22 '21 at 10:29
  • @PanagiotisKanavos, it's not slow, per se, I just wanted to make it as fast as possible. In my mind selecting from different tables at once would be an improvement. The last thing I now have left to understand is - if my first query is against "Table1" why are the other queries against "Table2" and "Table3" fast? I'd expect the server to cache the data from Table1, seeing as only it is queried the first time, so why does it cache the data from the other tables? Wait... could it be the first query isn't slow, because of the select but because it takes time to establish the connection? – user14092802 Mar 22 '21 at 10:31
  • @user14092802 `I just wanted to make it as fast as possible.` add indexes then. Or copy the data to another table that can be indexed. – Panagiotis Kanavos Mar 22 '21 at 10:32
  • @PanagiotisKanavos, yeah, yeah, I heard you about the indexes, I know "GetResult" blocks, that's the point, however calling it after you've started three tasks gives time to the two un-GetResulted tasks to run. – user14092802 Mar 22 '21 at 10:32
  • @user14092802 `why are the other queries against "Table2" and "Table3" fast?` are they? Or are you measuring the wrong duration? Or perhaps those "tables" join to the same Audit table? ORMs like EF deal with entities, not tables. Right now we have no idea what the actual tables, actual queries are. What you posted is inadequate, so the question `why exactly` can't be answered. It's clear there are issues though – Panagiotis Kanavos Mar 22 '21 at 10:36

1 Answers1

3

In order to speed up the getting of records I decided to make the requests async so that they can happen at the same time.

This isn't what the async/await pattern does. What you seem to want to do is parallelization, which you can use async methods and their associated Task, however this requires 3 distinct instances of a DbContext. As a very bare-bones example relative to the code you had:

using (var context1 = new AppDbContext())
{
    using (var context2 = new AppDbContext())
    {
        using (var context3 = new AppDbContext())
        {
            var table1Task = context1.tableOne
                .Where(log =>
                    log.DateUtc >= dateFrom || dateFrom == null)
                    && (log.DateUtc <= dateTo || dateTo == null))
                .OrderByDescending(log => log.Id)
                .Take(maxRecords).ToListAsync();
            var table2Task = context2.tableTwo
                .Where(log =>
                    log.DateUtc >= dateFrom || dateFrom == null)
                    && (log.DateUtc <= dateTo || dateTo == null))
                .OrderByDescending(log => log.Id)
                .Take(maxRecords).ToListAsync();
            var table3Task = context3.tableThree
                .Where(log =>
                    log.DateUtc >= dateFrom || dateFrom == null)
                    && (log.DateUtc <= dateTo || dateTo == null))
                .OrderByDescending(log => log.Id)
                .Take(maxRecords).ToListAsync();
           var resultTask = Task.WhenAll(table1Task, table2Task, table3Task);
           try
           {
               resultTask.Wait();
               var result1 = table1Task.Result;
               var result2 = table2Task.Result;
               var result3 = table3Task.Result;
               // Use your 3 lists.
           }
           catch(Exception ex)
           {// Handle any exception when running.
           }
        }
    }
}

Each query must run on a separate DbContext as the DbContext is not thread safe.

As mentioned in the comments, there are far better things to be looking into for improving performance including indexing and using projection rather than returning entire entities. If Table1 rows are quite large with many large fields that aren't needed by the resulting code, projecting with Select to get just the columns you need can help deliver far more performant code. Parallelism is only something I would consider when faced with running several queries where one or more might take a substantial amount of time and the results from each are completely independent.

Steve Py
  • 26,149
  • 3
  • 25
  • 43
  • Upvoted. I would use the [term](https://stackoverflow.com/questions/4844637/what-is-the-difference-between-concurrency-parallelism-and-asynchronous-methods) "concurrency" instead of "parallelization" though. – Theodor Zoulias Mar 22 '21 at 15:16