0

I have an Azure Function that pulls workorders, logs them, modifies them, and then logs them again so we can capture the before/after on splunk, but it is taking ~3 minutes to complete because execution appears to pause when I call the asynchronous logging function. 3 minutes isn't a crazy long time but I thought that an async call would be basically instant.

The function call being made from within the Azure Function:

// Internal library that applies business logic to the workorders
ExceptionEngine Engine = new ExceptionEngine();

// This is our teams logging library that we built from scratch
Logger Logger = new Logger();

Dictionary<string, WorkOrder> workItems = null;

try
{
    // This pulls ~15 thousand db rows with ~80 columns for each row. So the dataset is sizable but not huge
    workItems = Engine.GetWorkOrderData();    
}
catch (Exception e)
{
    Logger.LogError("Something happened");
}

_ = Logger.LogModelsAsync("precalculationworkorders", workItems.Values.ToList(), Formats.Json);

// Apply business logic to all workorders in the dictionary (this changes values in the referenced object)
foreach (string workOrderID in workItems.Keys.ToList())
{
    workItems[workOrderID].ApplyAllBusinessLogicAndCalculations();
}

_ = Logger.LogModelsAsync("postcalculationworkorders", workItems.Values.ToList(), Formats.Json);

Function body inside the Logger class:

public async Task<List<SendReceipt>> LogModelsAsync(string subject, IEnumerable<Object> iterable, Formats format = Formats.PipeDelimited)
{   
    List<Task<SendReceipt>> tasks = new List<Task<SendReceipt>>();

    // Cycling through the iterable/collections
    foreach (Object model in iterable)
    {
        // Converting model to dictionary
        Dictionary<string, Object> tempDataDict = LoggerHelper.GetPropertiesAndValues(model);
        
        // Generating log and sending to queue
        tasks.Add(this.LogDictAsync(subject, tempDataDict, format:format));
    }

    SendReceipt[] receiptArray = await Task.WhenAll<SendReceipt>(tasks);
    return receiptArray.ToList();
}

As the comments say, workItems will contain ~15,000 objects with ~80 properties each. So it is a moderate sized payload (but it is bigger than anything we've ever logged using this internal Logging library). As you can see, the calls we make to LogModelsAsync are not awaited and are instead being discarded.

I initially thought that the issue could be that since we are modifying workItems with workItems.ApplyAllBusinessLogicAndCalculations(), the process was waiting for LogModelsAsync to complete before we could modify it but we aren't even logging the workItems object, just the values in a list form which, unless I'm mistaken, should mean it doesn't matter what happens to the workItems afterwards because it's not using the referenced memory location.

What I thought would happen here is that Logger.LogModelsAsync would spawn a new thread and then the main thread would immediately continue on to the foreach loop immediately. Then, after all the business logic calculations, we would spawn another thread to log it a second time while the main thread continues on.

Am I misunderstanding how async calls/discarding works or am I doing something that makes it execute synchronously? Why is it waiting for the function call to complete before continuing?

ThomasJazz
  • 99
  • 9
  • What does `LogDictAsync` do? – Dai Mar 30 '22 at 18:11
  • 1
    @ThomasJazz You are not using `await` in your method until after the `foreach`, perhaps your bottleneck is in the `foreach` loop and not the asynchronous operation – Ryan Wilson Mar 30 '22 at 18:14
  • 3
    Unless there is a Task.Run inside LogDictAsync, all this (likely) runs on the same thread. So it has to share it. So it does, and your "main" operation is not going to receive a bigger share because you'd like it to. Async is not about threads, please see https://stackoverflow.com/q/17661428/11683 and https://stackoverflow.com/q/34680985/11683. – GSerg Mar 30 '22 at 18:14
  • 1
    I would highly recommend you consider GSerg's comment and make this multithreaded with Task.Run. You might just be causing unusual results with blocked call responses. When you do so remember to `await` them and never `.Wait()` an async task – ExplodatedFaces Mar 30 '22 at 18:31
  • @ExplodatedFaces There isn't a `Wait()` in this code, or anything that would deadlock. My comment was not an encouragement for the [use of Task.Run](https://blog.stephencleary.com/2013/10/taskrun-etiquette-and-proper-usage.html) either. – GSerg Mar 30 '22 at 18:47
  • @GSerg those links were incredibly helpful and now I have more questions about how async should even be used/how it can be applied in code. Applying the example to my code: I received an order of toast and coffee but the coffee was very intricate and took all of my focus and energy (logging 15k models) so I could not begin the toast (simple arithmetic calculations in the for-loop) at all because I was too busy. And using Task.Run is like asking my coworker to do the toast while im busy with the coffee. I also see you werent encouraging Task.Run but it appears to do exactly what I wanted. – ThomasJazz Mar 30 '22 at 20:17
  • @Gserg Sorry I was referencing making sure to not `.Wait()` if they implement a Task.Run. Essentially @ThomasJazz Task.Run can be used to move a task to a new thread outside of the main thread. If you are having an issue with CPU time on your main thread you might get all of your async responses at the same time (appearing synchronized) because it is too busy to report results while its still working. The OS scheduler will ultimately determine this behavior though. – ExplodatedFaces Mar 30 '22 at 22:46

1 Answers1

1

Thanks to the link GSerg posted and the suggestion from ExplodatedFaces, the logging function call is now instant. Turns out my understanding of async has been fundamentally wrong this entire time as I thought it was multi-threading. Also, this scenario did not fit the use case for the discard operator _.

I thought that using the discard operator was basically like telling the compiler that I don't care about the result of the task and it would therefore continue before the task is completed, but I guess it was still waiting for the task to complete despite my not needing it.

This link from GSerg does a particularly good job at giving an examples for async vs multi-threading: What is the difference between asynchronous programming and multithreading?

The code I modified to solve this:

log.LogInformation($"Calling LogModelsAsync (not awaited)");
// This line has a warning saying the task is not awaited
Task.Run(() => Logger.LogModelsAsync("precalculationworkorders", workItems.Values.ToList(), Formats.Json)); 
log.LogInformation($"Reached the next line of code. (not awaited)");

log.LogInformation($"Calling LogModelsAsync (awaited)");
// This line does not have a warning, but works just as quickly
var t = Task.Run(() => Logger.LogModelsAsync("precalculationworkorders", workItems.Values.ToList(), Formats.Json));
log.LogInformation($"Reached the next line of code. Has the call to LogModelsAsync completed? {t.IsCompleted}");

and the output it produced that shows the main thread continuing past the call to LogModelsAsync despite the task not being finished:

[2022-03-30T19:45:36.455Z] Calling LogModelsAsync (not awaited)
[2022-03-30T19:45:36.456Z] Reached the next line of code. (not awaited)
[2022-03-30T19:45:36.457Z] Calling LogModelsAsync (awaited)
[2022-03-30T19:45:36.459Z] Reached the next line of code. Has the call to LogModelsAsync completed? False
ThomasJazz
  • 99
  • 9
  • 1
    Be aware that the `Task.Run(() =>` lambda captures the `workItems` variable, and is going to interact with it on a `ThreadPool` pool. So make sure that you are not interacting afterwards with the variable on the current thread, unless the variable refers to a thread-safe object. – Theodor Zoulias Mar 30 '22 at 21:43
  • @TheodorZoulias the `LogModelsAsync` function doesn't modify `workItems` at all which I thought would be fine. When you say "current" thread, do you mean the main thread that continues on after `LogModelsAsync` or the one created by calling `Task.Run()`? – ThomasJazz Apr 01 '22 at 23:16
  • By "current thread" I mean the thread that invoked the `var t = Task.Run(...` line. The current thread is not suspended after creating the task, so at that point you have two concurrent execution flows (the other is a `ThreadPool` thread). Let's assume that after creating the task, the current thread does a `workItems.Add(x, y)`, and that the `workItems` is a normal `Dictionary`. This would be an illegal operation. Officially the behavior of your program would become undefined at this point. – Theodor Zoulias Apr 02 '22 at 00:04