1

So I'm having some difficulties with using Tasks to handle loads of HTTP requests.

What I'm trying to do, is to create a large image from a WMTS. For those who don't know, a WMTS is a Web Map Tile Service. So basically, you can request image tiles that are 256x256 by sending a request with the correct tileRow and tileColumn. So in this case I'm trying to construct images that contains hundreds or even thousands of these image tiles.

To do this, I created an application that:

  • calculates which tiles it needs to request based on input.
  • creates a list that I can use to do the correct HTTP requests to the WMTS.
  • sends these requests to the server and retrieves the images.
  • stitches the images together into one large image. Which is the result that we want.

As you may imagine, the amount of tiles grows exponentially. This doesn't really impact the CPU work, but is mostly I/O bound work. So instead of waiting for each request to return, before sending the next one, I thought it would be a good idea to use Tasks for this. Create the tasks that will handle each individual request, and when all tasks are finished, construct the large image.

So this is the method where I already know what tiles I am going to request. Here I want to recursively send the requests with tasks until all the data is complete (eventually with a max retry mechanism).

    public Dictionary<Tuple<int, int>, Image> GetTilesParallel(List<Tuple<int, int>> tileMatrix, int retry = 0)
    {
        //The dictionary we will return
        Dictionary<Tuple<int, int>, Image> images = new Dictionary<Tuple<int, int>, Image>();

        //The dictionary that we will recursively request if tiles fail.
        List<Tuple<int, int>> failedTiles = new List<Tuple<int, int>>();

        //To track when tasks are finished
        List<Task> tasks = new List<Task>();

        foreach (var request in tileMatrix)
        {
            Tuple<int, int> imageTile = new Tuple<int, int>(request.Item1, request.Item2);

            var t = Task.Factory.StartNew(() => { return GetTileData(imageTile.Item1, imageTile.Item2); }, TaskCreationOptions.LongRunning).ContinueWith(tsk =>
            {
                if (tsk.Status == TaskStatus.RanToCompletion)
                {
                    var response = tsk.Result.Result.Content.ReadAsByteArrayAsync().Result;
                    images.Add(imageTile, Image.FromStream(new MemoryStream(response)));
                }
                else
                {
                    failedTiles.Add(imageTile);
                }
            });

            tasks.Add(t);
        }

        Task.WaitAll(tasks.ToArray());

        if (failedTiles.Count > 0)
        {
            Console.WriteLine($"Retrying {failedTiles.Count} requests");
            Thread.Sleep(500);
            Dictionary<Tuple<int, int>, Image> retriedImages = GetTilesParallel(failedTiles, retry++);

            foreach (KeyValuePair<Tuple<int, int>, Image> retriedImage in retriedImages)
            {
                images.Add(retriedImage.Key, retriedImage.Value);
            }
        }
        return images;
    }

And this is the method that actually performs the HTTP request (I know not optimal or clean, but I'm first trying to get something working).

    private async Task<HttpResponseMessage> GetTileData(int tileColumn, int tileRow)
    {
        WMTSSettings settings = Service.Settings;

        Dictionary<string, string> requestParams = new Dictionary<string, string>();
        requestParams.Add("Request", "GetTile");
        requestParams.Add("Style", "Default");
        requestParams.Add("Service", "WMTS");
        requestParams.Add("Version", this.Service.Version);
        requestParams.Add("TileMatrixSet", settings.WMTSTileMatrixSet);
        requestParams.Add("TileMatrix", settings.WMTSTileMatrixSet + ":" + settings.WMTSTileMatrix);
        requestParams.Add("Format", settings.ImageFormat);
        requestParams.Add("Layer", settings.Layer);
        requestParams.Add("TileCol", tileColumn.ToString());
        requestParams.Add("TileRow", tileRow.ToString());

        string requestString = this.Service.BaseUri;

        for (int i = 0; i < requestParams.Count; i++)
        {
            if (i == 0)
            {
                requestString += "?";
            }

            requestString += requestParams.ElementAt(i).Key;
            requestString += "=";
            requestString += requestParams.ElementAt(i).Value;

            if (i != requestParams.Count - 1)
            {
                requestString += "&";
            }
        }

        CancellationTokenSource source = new CancellationTokenSource();
        CancellationToken token = source.Token;

        Task<HttpResponseMessage> response = HttppClient.GetAsync(requestString, token);

        return await response;
    }

I'm currently facing two problems, to which I've tried a lot of things:

  • In the current setup, in my ContinueWith task I'm getting some weird error that tells me that "Object reference not set to an instance of an object". Even though the response variable and imageTile variable in the ContinueWith task is not null?
  • And another issue is that I'm still getting TaskCancellationExceptions. But If I'm correct, those exceptions should be catched by the Continuation task?

Could someone point me in the right direction with this problem? Or is Tasks even the way to go?

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Martijn
  • 141
  • 1
  • 8

1 Answers1

5

Yes, tasks are the way to go, but no, ContinueWith is not the way to go. This method is mostly a relic from the pre async-await era, and it's rarely useful nowadays. The same is true for the Task.Factory.StartNew: you rarely need to use this method after the introduction of the Task.Run method.

A convenient way for creating the tasks needed to download the tile data, is the LINQ Select operator. You could use it like this:

async Task<Dictionary<(int, int), Image>> GetAllTileDataAsync(List<(int, int)> tiles)
{
    Task<(int, int, Image)>[] tasks = tiles.Select(async tile =>
    {
        (int tileColumn, int tileRow) = tile;
        int retry = 0;
        while (true)
        {
            try
            {
                using HttpResponseMessage response = await GetTileDataAsync(
                    tileColumn, tileRow);
                response.EnsureSuccessStatusCode();
                byte[] bytes = await response.Content.ReadAsByteArrayAsync();
                Image image = Image.FromStream(new MemoryStream(bytes));
                return (tileColumn, tileRow, image);
            }
            catch
            {
                if (retry >= 3) throw;
            }
            await Task.Delay(1000);
            retry++;
        }
    }).ToArray();
    (int, int, Image)[] results = await Task.WhenAll(tasks);
    return results.ToDictionary(e => (e.Item1, e.Item2), e => e.Item3);
}

Each tile is projected to a Task<(int, int, Image)>. The result of the task contains all the initial and acquired information about the tile. This eliminates the need to rely on dangerous side-effects for constructing the final Dictionary.

Notice the absence of any Task.Factory.StartNew, .ContinueWith, .Result, .Wait() and Task.WaitAll in the above code. All these methods are red flags in modern async-enabled applications. Everything happens by async/await composition. No threads are created, no threads are blocked, and your application reaches maximum scalability and responsiveness.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • Regarding the `TaskCanceledException`s, this might be relevant: [HttpClient throws TaskCanceledException on timeout](https://github.com/dotnet/runtime/issues/21965), or this: [HttpClient - A task was cancelled?](https://stackoverflow.com/questions/29179848/httpclient-a-task-was-cancelled) – Theodor Zoulias Jul 17 '21 at 22:18
  • 1
    You my remove `.ToArray()`. `WhenAll` accepts IEnumerable and evaluates it immediately. Also `HttpResponseMessage` is `IDisposable`. – aepot Jul 18 '21 at 22:03
  • 2
    @aepot yeap, indeed. I've added it for simplicity (the type `IEnumerable>` looks daunting), and because it prevents misuse. It is quite easy to enumerate the `IEnumerable` twice by accident, and then get frustrated because all the tasks are executed twice. Materializing the `IEnumerable` prevents this scenario. – Theodor Zoulias Jul 18 '21 at 22:08
  • 1
    Also `Image.FromStream(await response.Content.ReadAsStreamAsync())`, or is it a trick too? – aepot Jul 18 '21 at 22:13
  • @aepot this shouldn't make much of a difference. Check [the implementation](https://github.com/microsoft/referencesource/blob/5697c29004a34d80acdaf5742d7e699022c64ecd/System/net/System/Net/Http/HttpContent.cs#L166) of the `ReadAsStreamAsync`. If I am reading it correctly, a `MemoryStream` is filled internally anyway. – Theodor Zoulias Jul 18 '21 at 22:25
  • 2
    @aepot I added a `using` statement for disposing the `HttpResponseMessage`. Probably it would be better if the `GetTileDataAsync` returned a `Task` instead of a `Task`. – Theodor Zoulias Jul 18 '21 at 22:45