0

I'm trying to figure out the best approach to perform I/O in parallel. For quite long time I even thought that I understand this topic and there's nothing what could surprise me. But I was wrong.

Question: What is the best way to perform I/O in parallel (e.g. downloading feed via HTTP), taking in consideration that service may start refusing requests if there are too many of them?

Before I write any code...

  • when work I want to perform is some sort of computation, I want to use CPU (CPU bound task) -> PLINQ is the best (easiest) option
  • when work I want to perform is some sort of I/O operation, I want to use asynchronous model (await/async) to not block main thread (allowing web server handle more requests while some requests are just waiting for I/O)
  • optimal performance is when number of CPU equals to number of threads
  • when performing I/O without async model, I'm basically blocking thread until it gets response
  • threads are expensive (memory allocation, thread management, context switching)
  • Task is abstraction over thread: multiple Tasks may or may not be performed with one thread
  • TaskScheduler takes care of queuing work to threads, thread pool manages number of threads according to current environment and application needs
  • there still must be place where I await my Task to not block executing thread (so I can reuse thread for other task while waiting for response)

Solution?

I came with various ways how to consume feed, but basically almost all of them were not throttled, possibly causing overload of service.

  1. awaiting sequentially in foreach
  2. awaiting Task.WhenAll(taks)
  3. preparing tasks in parallel and then awaiting them all again
  4. parallel blocking .AsParallel().Select(t => t.Result)
  5. using asynchronous .ForEachAsync described by Stephen Toub: Implementing a simple ForEachAsync, part 2, storing results in ConcurrenyBag

(5) Queue and batches using tasks (throttling)

Allowing just few tasks to run. Similarly to #4, this requires some fiddling to find optimal performance.

    {
        var result = new List<DummyDelay>(Total);
        var queue = new List<Task<DummyDelay>>(Parallelism);
        for (var i = 0;; i++)
        {
            // 1. enqueue work
            if (i < Total)
            {
                queue.Add(LoadDummyAsync(Delay, Total, i));
            }

            // 2. no more work, break
            if (queue.Count == 0)
            {
                break;
            }

            // 3. if queue is big enough, await one result
            if (queue.Count == Parallelism || i > Total)
            {
                Task<DummyDelay> finishedTask = await Task.WhenAny(queue);
                queue.Remove(finishedTask);

                result.Add(finishedTask.Result);
            }
        }

        return result;
    }

(6) Producer-consumer pattern (throttling ?)

(not implemented) I'm bit skeptic because orchestration around producer-consumer will cost something, and I will still have to await somewhere.

Zdeněk
  • 929
  • 1
  • 8
  • 25
  • 2
    There's just too much here to fit the Stack Overflow Q&A style. Also, if 0 and 2 are not actual solutions, why put code? I would also just pick a batching strategy and just use `await Task.WhenAll(batch)` – Camilo Terevinto May 12 '18 at 20:31
  • @CamiloTerevinto That won’t give me throttling - not very suitable for real world use, isn’t it? – Zdeněk May 12 '18 at 20:48
  • 3
    You may want to look into the TPL Dataflow – Kevin Gosse May 12 '18 at 21:59
  • When you say throttling... So you want to run this in parallel as much as possible, but not make too many request in a certain amount of time? – TheGeneral May 12 '18 at 23:55
  • Reactive Extensions? – Luis May 13 '18 at 01:12
  • @TheGeneral I want to achieve the best utilization of CPU. Splitting into batches and awaiting batch may result in periods where I'm not processing any task (between first batch completed and starting new batch). I will check TPL Dataflow (Kevin) and Rx (Luis), thanks for pointing those out! – Zdeněk May 13 '18 at 07:10
  • 1
    https://stackoverflow.com/a/22493662/3067523 use semaphoreslim for throttling – Dmitry Pavliv May 13 '18 at 07:15

0 Answers0