0

I have to make 10 database calls to get 10 entries based on different parameters. I'm using Task.WhenAll to avoid running them in series, but the runtime indicates they're effectively still in series. Ie. a single entry takes 100ms to pull. Ten entries takes 1000ms. The time is unchanged from awaiting them in series. I've simplified, but will try to capture the essence of the structure. I've also removed my business logic and data structure.

private async Task<IEnumerable<Entry>> GetEntriesAsync(List<long> entryIds)
{
    var tasks = new List<Task<Entry>>();

    foreach (var entryId in entryIds)
    {
        var task = this.GetEntryAsync(entryId);
        tasks.Add(task);
    }

    await Task.WhenAll(tasks);

    var entries = tasks.Select(x => x.Result).ToList();

    return thumbnails;
}

I'd expect this to run in about the same amount of time regardless of the length of entryIds. Instead it scales in time taken with the number of entries.

Next question is what does GetEntryAsync actually do? It instantiates a DbContext, and then runs a Select against IQueryable in that context. The contexts are instantiated for each run through the loop (ie 10 entries = 10 contexts), that way I can make the multiple database requests concurrently.

public virtual async Task<Entry> GetEntryAsync(long entryId) 
{ 
    // unitOfWorkFactory.Create() instantiates a repository with a dbcontext
    // instantiated using DBContextFactory.CreateDBContext()
    var entryRepo = this.unitOfWorkFactory.Create().GetWordsRepository(language);

    Entry entry = await entryRepo.GetEntryAsync(entryId);

    return Entry;
}

Within the entryRepo class I have:

public async Task<Entry> GetEntryAsync(long entryId) 
{ 
    // Entries is a DBSet<Entry>
    IQueryable<Vocabulary> entries = this.dbContext.Entries
                                                   .AsNoTracking()
                                                   .Where(x => x.Id == entryId);

    var entry = await this.GetEntryAsync(entries)

    return entry;
}

private async Task<Entry> GetEntryAsync(IQueryable<Entry> entries) 
{ 
    return await entries.Select(x => new Entry()
                                         {
                                             Id = x.Id
                                         }).FirstOrDefaultAsync();
 }

I need the query to run closer to 100ms rate. I've also tried using the Select method and Parallel.ForEachAsync as described here: Parallel foreach with asynchronous lambda. All have the same result.

I've put console writes in my functions, so I know the tasks are all starting before the first finishes. So the tasks are not actually in series, but for some reason the time taken is as if they are in series.

Edit: important clarification: The where clause is not based on just an entryId, and entryId is not the key Id in the table I'm hitting. There can be multiple rows with that entryId. I'm also using several more parameters to get the specific entry (language, version, secondaryId1, secId2, secId3, groupNumber, type, etc).

I'm not sure how to write a single query because, I need say 10 entries where each have a different combination of the where parameters.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Dylan Cross
  • 544
  • 2
  • 6
  • 14
  • 3
    Databases don't like being "parallelized" because they are generally IO bound anyway. Why don't you just do one big `.Where` with all your IDs? I would have thought that would be fastest. – Charlieface May 24 '23 at 16:25
  • You're right, the actual where clause is based on 6 fields, and the values are all different for each entry. I simplified since I didn't think the additional fields were the cause here. – Dylan Cross May 24 '23 at 17:00
  • 1
    They are probably running in series due to database locks. You might get better performance if you used a single unit of work to retrieve all of the items instead of creating a new one for each, so that they all come under a single DB transaction. – John Wu May 24 '23 at 17:02
  • I had a single unit of work before. I implemented the multiple unit of work version to try to get it running faster. Basically the same times either way. – Dylan Cross May 24 '23 at 17:03

2 Answers2

2

Parallelization might not increase performance in this case. Like @Charlieface said. Databases don't like that. Querying all entities in one query however helps a lot. If the amount of entryIds you want to query is small. (say: less than 100) using List.Contains() can help a lot.

private async Task<List<Entry>> GetEntriesAsync(List<long> entryIds)
{
    return await this.dbContext.Entries.AsNoTracking()
        .Where(x => entryIds.Contains(x.Id))
        .Select(x => new Entry()
        {
            Id = x.Id
        }).ToListAsync();
}

Linq sends this to SQL Server in an optimized way, because it understands what List.Contains() does. This doesn't work for all methods, but Contains() is special.

Look out for IQueryable methods that starts with 'To' Like ToList() or ToDictionary(). They cause the query to be executed and pull the result in memory. So Calling First() or FirstOrDefaultAsync() after them does nothing for performance.

That's why I choose to use the async version of ToList(), since that is where the querying happens.

  • This essentially worked. I had to add make a much more complicated query, and do some Group By. I also learned you can't throw a Tuple in Contatins with EF, but you gave me the push to try writing a better query again. I actually over query the DB and end up doing some filtering in C# after, but the number of records is minor. Worst case ~200. – Dylan Cross May 25 '23 at 14:27
1

Remove ToList in GetEntryAsync:

 private async Task<Entry> GetEntryAsync(
        IQueryable<Entry> entries) 
        { 
            return await entries.Select(x => new Entry()
                {
                    Id = x.Id
                }).FirstOrDefaultAsync();
        }

Not only you are making method syncronious but you also fetch everything into memory while you need only 1 entry.

But in general you should just write query so it returns all needed entries.

Guru Stron
  • 102,774
  • 10
  • 95
  • 132
  • There actually isn't a ToList in my code. That was a mistake moving it to SO. I'm not how to write a single query. The actually where query has (language, version, id1, id2, id3, groupNumber, type, etc) as parameters and the parameters can all be different for each entry. I used entryId just to simplify my example. – Dylan Cross May 24 '23 at 17:09