1

I have a list of client id and for each client id I need to get data from cassandra. So I am executing all those client id's in parallel instead of using IN clause query which is not good for performance.

So I came up with below code which execute multiple async calls for each client id and it does the job by getting data out of cassandra but is it the right way to execute multiple async calls in parallel or am I doing something wrong here which can affect my performance?

public async Task<IList<Item>> GetAsync(IList<int> clientIds, int processId, int proc, Kyte kt)
{
    var clientMaps = await ProcessCassQueries(clientIds, (ct, batch) => mapper.SingleOrDefaultAsync<ItemMapPoco>(itemMapStmt, batch), "GetPIMValue");

    if (clientMaps == null || clientMaps.Count <= 0)
    {
        return null;
    }
    // .. do other stuff and return
}

// this executes multiple client ids in parallel - but is it the right way considering performance?
private async Task<List<T>> ProcessCassQueries<T>(IList<int> ids, Func<CancellationToken, int, Task<T>> mapperFunc, string msg) where T : class
{
    var requestTasks = ids.Select(id => ProcessCassQuery(ct => mapperFunc(ct, id), msg));
    return (await Task.WhenAll(requestTasks)).Where(e => e != null).ToList();
}

// this might not be good
private Task<T> ProcessCassQuery<T>(Func<CancellationToken, Task<T>> requestExecuter, string msg) where T : class
{
    return requestExecuter(CancellationToken.None);
}

I recently started using C# so have limited knowledge around that so maybe my code might be not good in terms of performance. Specially ProcessCassQueries and ProcessCassQuery methods. Anything that can be improved here or can be written in a better way considering it's a prod code?

Update:

Basis on suggestion, using semaphore to limit number of async calls as shown below:

private var semaphore = new SemaphoreSlim(20);

private async Task<List<T>> ProcessCassQueries<T>(IList<int> ids, Func<CancellationToken, int, Task<T>> mapperFunc, string msg) where T : class
{
    var tasks = ids.Select(async id => 
    {
        await semaphore.WaitAsync();
        try
        {
            return await ProcessCassQuery(ct => mapperFunc(ct, id), msg);
        }
        finally
        {
            semaphore.Release();
        }
    });

  return (await Task.WhenAll(tasks)).Where(e => e != null).ToList();
}
dragons
  • 549
  • 1
  • 8
  • 24

1 Answers1

2

What you are doing is correct. You are launching a bunch of tasks all at once, and then await all of them to complete. There is no inefficiency or bottleneck regarding this specific C# code. It is a bit strange that you pass a hardcoded CancellationToken.None in the ProcessCassQuery, but it will not affect the performance. The performance of the whole operation now depends on the behavior of the Cassandra database, when it is bombarded with multiple simultaneous requests. If it is optimized for this kind of usage then everything will be OK. If not, then your current setup doesn't offer the flexibility of configuring the level of concurrency to a value optimal for the specific database engine. For ways to limit the amount of concurrent async I/O operations look here.

As a side note, according to the official guidelines the asynchronous methods ProcessCassQueries and ProcessCassQuery should have the Async suffix.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • Thanks for your explanation. Appreciate it. I think you made a good point about putting a back pressure instead of launching multiple async calls simultaneously and I will take a look at that link and will try to implement it. Also can you check at this [pastebin link](https://pastebin.com/Rppj0m8H) which has full code for my `GetAsync` method and give me some suggestions if they are also fine in terms of performance leaving cassandra piece. – dragons May 23 '20 at 06:58
  • The code is very simple actually in that link - it is trying to retrieve data from three different tables asynchronously and once data is out it tries to do some manipulation and return the result back. Since I am coding it for the first time in C# so wanted to get some pointers if that is also good and no issues. Let me know if anything can be improved there. Appreciate your help! – dragons May 23 '20 at 06:59
  • @dragons from a quick look there is no apparent performance problem with your code. I noticed a possible bug here: `itmProDict.TryGetValue(kvp.Key, out var ip);`. The `bool` return value is not checked. I don't know if this is important. – Theodor Zoulias May 23 '20 at 08:07
  • got it. Thanks for reviewing it. I will fix that bug. Also I updated my question with the suggestion you gave to limit number of async calls. Does that look right in my new updated code? That is the place we need to make this change right? Let me know if there are any issues in that code. – dragons May 23 '20 at 21:45
  • @dragons your updated code is great. I have no improvement to suggest. – Theodor Zoulias May 24 '20 at 03:18
  • hey I started using your semaphore suggestion recently which I have in my updated code and I realize I get an error here `return (await Task.WhenAll(tasks)).Where(e => e != null).ToList();` since `await Task.WhenAll(tasks))` returns void so it gives me error. Is there anything wrong I did there? – dragons Jun 05 '20 at 03:06
  • @dragons I think that you should add the `return` keyword before the `ProcessCassQuery(...` method call, so that the generated tasks have a result. – Theodor Zoulias Jun 05 '20 at 03:17
  • You mean `return await ProcessCassQuery(ct => mapperFunc(ct, id), msg);` right? I updated my code in the question. Let me know if that's what you mean. With that I don't see any compilation error. – dragons Jun 05 '20 at 03:47
  • 1
    @dragons yeap, without the `return` each async lambda produces a `Task`, instead of the desirable `Task`. – Theodor Zoulias Jun 05 '20 at 04:36
  • I spoke with my colleague about this and he mentioned semaphore won't work here since they aren't threads as they are tasks. I opened a new question [here](https://stackoverflow.com/questions/62288411/how-to-limit-number-of-async-io-tasks-to-database-in-c) with all the details. Any thoughts on that? I need to study c# more looks like I am confuse between `Threads` and `Tasks` here. – dragons Jun 09 '20 at 17:27
  • sure no rush. Just wanted to see your opinion. thanks for your help! – dragons Jun 09 '20 at 17:44