0

I am running a bunch of convert-to-pdf api calls through MS Graph and then returning the base64 encoded strings to the frontend for processing in pdf.js. However, I wanted to speed this up by adding async and await to do as many api calls as I can in the shortest time (not using batch in MS Graph). However, it is unclear if this code is looping through the for loop to the next call or waiting for each await result individually which I do not want. I want to keep looping through for each request and then return the final result once all the calls have finished.

        public async Task<ActionResult> PDFVersionsToBase64([FromForm] string ID)
    {
        //Using FormData on frontend
        //checking ID exists on searching in dashboard
        if (String.IsNullOrEmpty(ID))
        {
            return Ok(new { Result = "Failed" });
        }
        else
        {

            String query = "select * from dbo.function(@ID)";
            using (var connection = new SqlConnection(connectionString))
            {
                var documents = connection.Query<QuestionDocuments>(query, new { ID = ID});

                foreach (var file in documents)
                {
                    var fullPath = Path.Combine(fileDirectory, file.path);
                    using (var memoryStream = await oneDrive.DriveItemConvertToPdfAsync(SharedDriveID, fullPath, "path"))
                    {
                        byte[] buffer = new byte[memoryStream.Length];
                        memoryStream.Read(buffer, 0, (int)memoryStream.Length);
                         file.Base64 = Convert.ToBase64String(buffer);
                    }
                }

                return Ok(documents);

            }
        }
    }

        public async Task<Stream> DriveItemConvertToPdfAsync(string DriveID, string PathOrDriveItemID, string SearchType)
    {
        Stream response = null;
        try
        {
            var queryOptions = new List<QueryOption>()
            {
                new QueryOption("format", "pdf")
            };

            if (SearchType == "path")
            {
                response= await _graphServiceClient.Me.Drives[DriveID].Root
                .ItemWithPath(PathOrDriveItemID)
                .Content
                .Request(queryOptions)
                .GetAsync();
            }
            else if (SearchType == "id")
            {
                response = await _graphServiceClient.Me.Drives[DriveID].Items[PathOrDriveItemID]
                .Content
                .Request(queryOptions)
                .GetAsync();
            }
        }
        catch (ServiceException ex)
        {
            Console.WriteLine($"Error deleting file: {ex.ToString()}");
        }
        return response;
    }
Irish Redneck
  • 983
  • 7
  • 32

1 Answers1

1

You need to start all tasks and then wait for them all at once.

var files = documents.ToArray();

var tasks = files.Select(file =>
    oneDrive.DriveItemConvertToPdfAsync(
        SharedDriveID, 
        Path.Combine(fileDirectory, file.Path), 
        "path"));

var streams = await Task.WhenAll(tasks);

for(var i = 0; i < files.Length; i++)
{
    var file = files[i];
    var stream = streams[i];
    using (var memoryStream = stream)
    {
        byte[] buffer = new byte[memoryStream.Length];
        memoryStream.Read(buffer, 0, (int)memoryStream.Length);
         file.Base64 = Convert.ToBase64String(buffer);
    }
}

Or if you want to use the ReadAsync method of the stream instead:

var tasks = documents.Select(async file => {
    var stream = await oneDrive.DriveItemConvertToPdfAsync(
        SharedDriveID, 
        Path.Combine(fileDirectory, file.Path), 
        "path");
        
    using (var memoryStream = stream)
    {
        byte[] buffer = new byte[memoryStream.Length];
        await memoryStream.ReadAsync(buffer, 0, (int)memoryStream.Length);
        file.Base64 = Convert.ToBase64String(buffer);
    }
});

await Task.WhenAll(tasks);
Pharaz Fadaei
  • 1,605
  • 3
  • 17
  • 28
  • Thank you for demonstrating this. So under this approach the api calls all occur at once which is the slowest part of my code so that is good to see. However, the second part is occurring synchronously right? Is this the most optimal approach? – Irish Redneck Dec 14 '22 at 22:04
  • @IrishRedneck Yes, the for loop will run synchronously. In order to further optimize this, you may use the `ReadAsync` method of the stream instead. – Pharaz Fadaei Dec 14 '22 at 22:23
  • Re "under this approach the api calls all occur at once": not really, you are only allowing .NET to use the most parallelism that it has resources for, and that the rest of the chain (OS, Network layer, Web/API server) can handle as well. For details and nuances, see [this answer by Stephen Cleary](https://stackoverflow.com/a/20622253/) – Peter B Dec 14 '22 at 22:47
  • Is this statement true: if the parent async method completes and the no children async methods that are called are awaited then they will continue to run/finish in the background threads despite the parent async method may have already completed. I had also read that using using async tasks in LINQ is risky, so would that be the case for the two methods you propose and if not then method 2 should be faster right? – Irish Redneck Dec 15 '22 at 15:11
  • @IrishRedneck first question: This is called fire and forget. You can do this if you don't care about the result of child tasks. But on ASP.NET, apparently there is no guarantee that they will finish, read more in [this answer](https://stackoverflow.com/a/15523793/1539231). – Pharaz Fadaei Dec 15 '22 at 16:15
  • @IrishRedneck second question: I don't consider it risky as long as you know what you are doing. Here, we simply transform each file into a task, and then we await them all. You can do this without using Linq. You can't say which one is faster without actually benchmarking them. I guess, the efficiency of the second approach largely depends on how large your pdfs are. – Pharaz Fadaei Dec 15 '22 at 16:17
  • @Pharaz Fadaei does each task (file in the LINQ query above) have its own thread for its respective operation? So four files would generate four threads? – Irish Redneck Dec 16 '22 at 00:12
  • @IrishRedneck tasks provide an abstraction layer over threads. When using tasks, you declare that you need some operations to be performed asynchronously. How? You don't care. TPL is repsonsible, if a new thread is needed, TPL will take care of that, if an existing thread is idle and ready to perform a new job, TPL will take care. Read [this answer](https://stackoverflow.com/a/20622253/1539231). – Pharaz Fadaei Dec 16 '22 at 11:06