0

When writing about 200 300k blobs to blob storage using Task.WhenAll(List) the writes hang / take significantly longer than performing each in order.

I'm running the process in a function app.

Doesn't work

private async Task WriteToBlobAsync(List<DataSeries> allData)
        {
            int blobCount = 0;
            List<Task> blobWriteTasks = new List<Task>();
            foreach(DataSeries series in allData)
            {
                blobCount++;
                string seriesInJson = JsonConvert.SerializeObject(series);
                blobWriteTasks.Add(_destinationBlobStore.WriteBlobAsync(seriesInJson, series.SaveName));
                //await _destinationBlobStore.WriteBlobAsync(seriesInJson, series.SaveName);
                if (blobCount % 100 == 0)
                {
                    _flightSummaryDoc.AddLog($"{blobCount} Blobs Complete");
                    _log.Info($"{blobCount} Blobs Complete");
                }
            }
            await Task.WhenAll(blobWriteTasks.ToArray());
        }

Works significantly faster (but shouldn't)

private async Task WriteToBlobAsync(List<DataSeries> allData)
        {
            int blobCount = 0;
            List<Task> blobWriteTasks = new List<Task>();
            foreach(DataSeries series in allData)
            {
                blobCount++;
                string seriesInJson = JsonConvert.SerializeObject(series);
                //blobWriteTasks.Add(_destinationBlobStore.WriteBlobAsync(seriesInJson,series.SaveName));
                await _destinationBlobStore.WriteBlobAsync(seriesInJson, series.SaveName);
                if(blobCount % 100 == 0)
                {
                    _flightSummaryDoc.AddLog($"{blobCount} Blobs Complete");
                    _log.Info($"{blobCount} Blobs Complete");
                }
            }
            //await Task.WhenAll(blobWriteTasks.ToArray());
        }
Tony Ju
  • 14,891
  • 3
  • 17
  • 31
A.Rowan
  • 1,460
  • 2
  • 16
  • 20
  • What does `WriteBlobAsync` look like? – Stephen Cleary May 02 '19 at 15:22
  • I don't know what the maximum number of connections should be, but I'm almost positive the problem is you're hammering your server with a bunch of concurrent requests. If I were you, I would try only keeping only a few requests open at once. Maybe add new requests only after old ones complete? Maybe await when the count of uncompleted tasks is 3-5. I'm not sure if that's the best way to do that. Perhaps this warrants a new SO post on how to best space out concurrent requests. But certainly don't try 200 at once. – Slothario May 02 '19 at 16:03

1 Answers1

1

It's slowing down and failing because it can't handle 200 concurrent requests.

Consider using SemaphorSlim to use its built-in throttling mechanism and limit the concurrent requests to a more reasonable number.

See this post: How to limit the amount of concurrent async I/O operations?

Slothario
  • 2,830
  • 3
  • 31
  • 47
  • Interesting. So 200 is too many, how many can I safely configure my semaphor to? What is the bottleneck? – A.Rowan May 06 '19 at 01:54
  • 1
    @A.Rowan I'm not really sure, you'd have to check the blob provider documentation. However, my shoot-from-the-hip guess would be somewhere in the range of 3-10. You don't gain much from using a high number of concurrent connections because it just means that each one goes slower. If I were you I'd just benchmark it and see what sort of number works best in practice. – Slothario May 06 '19 at 13:18