0

Problem i'm trying to solve:

For each directory, it exists some files, and I want to upload this to Azure.

So I want to do this: Task1 - uploading files in directory 1 to azure Task 2 - uploading files in directory 2 to azure

I want to do this concurrently.

I have the following code:

private async Task ProcessMatFiles(string directory, List<FileInfo> matFiles)
{
    foreach (var file in matFiles)
    {
        if (!string.IsNullOrEmpty(file.Name) && !string.IsNullOrEmpty(directory) && !string.IsNullOrEmpty(file.FullName))
        {
            var cloudBlockBlob = this._cloudBlobContainer.GetBlockBlobReference("textures/" + directory + "/" + file.Name);

            if (!await cloudBlockBlob.ExistsAsync())
                await cloudBlockBlob.UploadFromFileAsync(file.FullName);
        }
    }
List<Task> tasks = new List<Task>();
foreach (var directory in matFileDirectories)
{
    // Get all the files in the directory
    var matFiles = new DirectoryInfo(directory).EnumerateFiles().ToList();

    // Get the directory name of the files
    var matDirectory = Path.GetFileName(Path.GetDirectoryName(matFiles.FirstOrDefault().FullName));

    if (matFiles.Count > 0 && !string.IsNullOrEmpty(matDirectory))
    {
        var task = new Task(() =>this.ProcessMatFiles(matDirectory, matFiles));
        tasks.Add(task);
        task.Start();
    }
}

Task.WaitAll(tasks.ToArray());

With this code, i get the following warning:

Because this call is not awaited, execution of the current method continues before the call is completed. Consider applying the 'await' operator to the result of the call.

What does that mean? How does this affect my code?

I can remove the warning by doing like this:

var task = new Task(async () => await this.ProcessMatFiles());

Is this the correct way to this?

Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
Bryan
  • 3,421
  • 8
  • 37
  • 77
  • No, don't use `Task.Start()` and `Task.Wait()`. That defeats the purpose of tasks. They aren't threads, they are a *promise* that something will produce a result in the future. Use `Task.Run` or `await` – Panagiotis Kanavos Oct 10 '19 at 12:25
  • @PanagiotisKanavos check my updated question. Im creating many tasks and then wait for them to complete. – Bryan Oct 10 '19 at 12:27
  • @PanagiotisKanavos how would you do instead in my case? – Bryan Oct 10 '19 at 12:28
  • Why don't you simply use `await`? Every edit adds extra problems. `Task.Wait()` won't wait on the async process it will await on the task that starts that async process. The result is still a fire-and-forget call with an extra `Task` object for the GC to collect. You could have just written `ProcessMatFiles` – Panagiotis Kanavos Oct 10 '19 at 12:29
  • @PanagiotisKanavos can you give me an example how you would do it? – Bryan Oct 10 '19 at 12:30
  • Do *what*? What is the *actual* problem you want to solve? Process one file at a time without blocking? Simply use `await ProcessMatFiles(matDirectory, matFiles);`. Start processing all files in parallel? Put the `Task` already returned by `ProcessMatFiles` in the list and await all those tasks at the end with `await Task.WhenAll(tasks)`. You can even convert the loop into a LINQ query whose final step is a `select.ProcessMatFiles(matDirectory, matFiles)` – Panagiotis Kanavos Oct 10 '19 at 12:32
  • @PanagiotisKanavos I updated my question with my problem im trying to solve. – Bryan Oct 10 '19 at 12:45
  • @Bryan in that case you need to process the files one by one. Each upload is a separate asynchronous operation – Panagiotis Kanavos Oct 10 '19 at 12:45
  • @PanagiotisKanavos i also provided the code for ProcessMatFiles. I just want to upload the files for the different directories independent of each other. – Bryan Oct 10 '19 at 12:47
  • @PanagiotisKanavos awesome. Brillaint. – Bryan Oct 10 '19 at 13:11

1 Answers1

1

The real problem seems to be how to process multiple files in parallel. ProcessMatFiles returns a Task already and I'd assume it doesn't run anything heavy on the caller's thread. That task can be stored in the tasks list. That list can be awaited without blocking with

await Task.WhenAll(tasks);

A better solution would be to convert the entire loop into a LINQ query that returns the Tasks and await it.

var tasks = from var directory in matFileDirectories
            let dir=new DirectoryInfo(directory)
            let files=dir.GetFiles()
            select ProcessMatFiles(dir.Name, files));

await Task.WhenAll(tasks);

The problem with this is that enumerating the files in a folder is expensive itself, and GetFiles(), or using EnumerateFiles().ToList() has to wait for the enumeration to finish. It would be better if ProcessMatFiles received the DirectoryInfo object and enumerated the files in a separate thread.

Another improvement would be to process the files one-by-one :

var tasks = from var directory in matFileDirectories
            let dir=new DirectoryInfo(directory)
            from file in dir.EnumerateFiles()
            select ProcessMatFile(dir.Name, file));

It's possible to improve this further if one knows what ProcessMatFiles does, eg use Dataflow blocks or Channels for throttling and using a specific number of tasks, breaking the process into multiple concurrent steps etc.

Update

Since this is a file upload operation, each file is a separate asynchronous operation. Most of the checks can be removed when working with DirectoryInfo and FileInfo objects.

The upload method should be just :

async Task Upload(FileInfo file)
{
    var folder=file.Directory.Name;
    var blob = _cloudBlobContainer.GetBlockBlobReference(${"textures/{folder}/{file.Name}";
    if (!await blob.ExistsAsync())
    {
        await blob.UploadFromFileAsync(file.FullName);
    }
}

The task-producing query can be simplified to :

var tasks = from var directory in matFileDirectories
            let dir=new DirectoryInfo(directory)
            from file in dir.EnumerateFiles()
            select UploadFile(file);

await Task.WhenAll(tasks);

This will try to fire off all upload operations as fast as the files can be iterated. This could flood the network. One solution is to use an ActionBlock that will only use eg 8 tasks at a time to upload files. A limit is placed on the input buffer too, to avoid filling it with eg 1000 FileInfo items :

var options=new ExecutionDataflowBlockOptions
      {
         MaxDegreeOfParallelism = 8,  //Only 8 concurrent operations
         BoundedCapacity=64           //Block posters if the input buffer has too many items
      } ;
var block=new ActionBlock<FileInfo>(async file=>UploadFile(file),options);

var files = from var directory in matFileDirectories
            let dir=new DirectoryInfo(directory)
            from file in dir.EnumerateFiles()
            select file;

foreach(var file in files)
{
    //Wait here if the input buffer is full
    await block.SendAsync(file);
}

block.Complete();

//Wait for all uploads to finish
await block.Completion;
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236