1

On a request on my Web API, I'm saving a image to the disk and also processing it with an external API which usually takes several seconds. This is a high traffic API, thus I would like to design it in the most efficient way. The image comes in Base64 "encoding", but this is not pertinent. We can think about it as an arbitrary byte array of 150KB in average (so the saving to disk operation should be really fast).

The workflow is (the first two operations don't need to execute in any order, obviously):

  • Save image on disk (async)
  • Process it on an external API (async)
  • Return Ok in case of both previous operations succeed

With that in mind, I put together this (simplified) code:

public async Task<IActionResult> Post([FromBody] string imageBase64)
{
    // Convert Image
    byte[] imageByteArray = Convert.FromBase64String(imageBase64);

    // Start async Task to Save Image
    Task taskSaveImage = System.IO.File.WriteAllBytesAsync(@"C:\ImagesStorage\image.jpg", imageByteArray);

    // Execute some heavy async processing
    await ProcessOnExternalAPI(imageByteArray);

    // Guarantee that Save Image Task has completed
    await taskSaveImage;

    // Return 200 Ok
    return Ok();
}

This code seems to me the most efficient way to save the image on disk, and also process it with the external API, both at the same time while not blocking the ASP.Net CORE working thread. Is that right, or there is some more efficient way to do it?

Also, there is any problem to share the byte[] imageByteArray object between the two Tasks (hence possibly two threads)? I believe that .Net CORE would take care of it, but I would not be happy to discover that i'm wrong once in production.

  • Obs 1: I know that receiving a stream for the byte array may lead to a better performance than receiving a Base64 encoded byte array. I have no control over that. It's how the API has to be.
  • Obs 2: The request to the external RESTful API (inside the ProcessOnExternalAPI method above) is made using the async method PostAsync from System.Net.HttpClient class.
  • Obs 3: My biggest concern is always have Working Threads to respond to the requests. This is not the only service that my API respond to.
  • Obs 4: I also checked out this question/answer on SO: Using async/await for multiple tasks But over there, there is no concern about the asp.net core working threads, which is my main concern.
  • Obs 5: The code is made based on my humble knowledge of async programming on ASP.Net CORE, and also relying on the this Microsoft's Documentation: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/ That said, I would love to hear from someone with more experience on async, on this type of scenario.
TheGeneral
  • 79,002
  • 9
  • 103
  • 141
Vitox
  • 3,852
  • 29
  • 30
  • 1
    what is your criteria for "most efficient way"? do you measure it by number of seconds this API method takes to complete? or number of resources you use? or some other way? – dee zg Dec 12 '20 at 06:40
  • At about this time you would start benchmarking. – TheGeneral Dec 12 '20 at 06:43
  • @deezg the most efficient way is the way it execute the tasks in parallel, while keeping the threads that actually process asp.net core requests unblocked and available to process another requests while executing the previous mentioned tasks – Vitox Dec 12 '20 at 06:44
  • @TheGeneral So there is no obvious better implementation at this point? My concern is maybe a lack of seeing something obviously wrong with that approach, or some presumption that i did, that is actually incorrect about this implementation – Vitox Dec 12 '20 at 06:47
  • you are not doing anything "obviously wrong". as @The General said, this is the moment to start measuring and decide upon that. what i would probably do, if your business rules allow for it, is to offload requests to 3rd party API to some queue. so, you save your image to disk (or wherever) and add reference to it to a queue. then separate process reads the queue and send requests to 3rd party API. its more complicated infrastructure wise but more efficient and removes load from your API. – dee zg Dec 12 '20 at 06:50
  • @deezg Yes. I totally agree that an asynchronous request-respond approach would be ideal, but that is not possible. That's exactly my motivation on keeping the working threads as available as possible, since I need to keep the connection alive, and client keeps waiting the response synchronously, and at the same time have a lot of other concurrent requests coming... – Vitox Dec 12 '20 at 06:56
  • I think your disk IO is going to throttle before anything else. As such i would consider using a FileStream and controlling the stream buffer size, if its an ssd i would consider setting it to the maximum byte size or higher. Also dont forget the async flag if you are calling the stream with the async overloads. And yeah if you are really serious about this, you would probably want to use a message bus and distribute the work load – TheGeneral Dec 12 '20 at 06:56
  • @TheGeneral Fortunately this is the only disk IO intensive operation on my API, and is well inside the capabilities of the disks. The high traffic is actually spread between the services on this API, so this disk operation happens only on a fraction of the total requests. But that is a nice thinking (we never know would be out next challenge), I didn't thought about that... – Vitox Dec 12 '20 at 07:08
  • As I said my biggest concern is about the threads, since I need them available to the other services which receive a significant bigger share of requests. Since my experience with async programming on an web request environment is not absolutely extensive, I would love to hear from the community if i'm in the right path... – Vitox Dec 12 '20 at 07:12
  • 1
    You only have 2 simple calls here, You are doing the right thing already (assuming you don't want to take more drastic action as has been discussed). You could also use await `Task.WhenAll` though there will be no appreciable difference in execution time – TheGeneral Dec 12 '20 at 07:40
  • You code is not the most efficient, but probably is the most scalable. You could make it more efficient and probably less scalable by using the synchronous `File.WriteAllBytes` instead of the `File.WriteAllBytesAsync`. The asynchronous filesystem APIs [are not implemented efficiently](https://stackoverflow.com/questions/63217657) in .NET. As a side note, by not awaiting both tasks with `Task.WhenAll` you are opening the possibility of the first task becoming fire-and-forget, in case the second task fails. – Theodor Zoulias Dec 12 '20 at 12:03

1 Answers1

4

Is that right, or there is some more efficient way to do it?

In terms of threads, that is correct: your code is doing two concurrent asynchronous operations.

I do prefer using Task.WhenAll, since that makes the intent of the code more explicit (and also handles an edge case where the write-to-disk task can become fire-and-forget, as noted by Theodor in the comments):

public async Task<IActionResult> Post([FromBody] string imageBase64)
{
  // Convert Image
  byte[] imageByteArray = Convert.FromBase64String(imageBase64);

  // Start async Task to Save Image
  Task saveImageTask = System.IO.File.WriteAllBytesAsync(@"C:\ImagesStorage\image.jpg", imageByteArray);

  // Start task to call API
  Task processTask = ProcessOnExternalAPI(imageByteArray);

  // Asynchronously wait for both tasks to complete.
  await Task.WhenAll(saveImageTask, processTask);

  // Return 200 Ok
  return Ok();
}

Also, there is any problem to share the byte[] imageByteArray object between the two Tasks (hence possibly two threads)? I believe that .Net CORE would take care of it, but I would not be happy to discover that i'm wrong once in production.

No, no problems there. It's safe to share values that don't change.

Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
  • The `Task.WhenAll` approach really does code more clearer. About the "fire-and-forget" issue, that would only happen if I forget to write the await for that task latter (line `await taskSaveImage;` on my example code), right? Just one point seems to be left out: In terms of threads, is this approach correctly letting the Asp.NET Core working thread available while executing the async operations? – Vitox Dec 14 '20 at 01:12
  • 1
    "if I forget" - no; the problem alluded to is that if `ProcessOnExternalAPI` throws, then `taskSaveImage` is never `await`ed; by using `Task.WhenAll`, you ensure they are both always `await`ed, even if one throws. Yes, this approach yields the thread back to the thread pool at the `await Task.WhenAll` line. – Stephen Cleary Dec 14 '20 at 02:47
  • Now I got it. Thanks for the explanation! – Vitox Dec 14 '20 at 04:20