0

I am currently working on a project to build an integration between an existing ASP.Net MVC website and a file hosting service my company is using. The typical use case is:

  1. A user requests one or more files
  2. The controller makes one call per file to the file host API
  3. The file host returns the file data to the controller
  4. The controller returns a file result

The hosting service can handle concurrent calls, and I've found that executing each API call within a task (see example below) leads to fairly drastic improvements.

private void RetrieveDocuments(DocumentIdentifier[] identifiers, List<FileHostResult> results)
{
    var tasks = identifiers.Select(x => RetrieveDocument(results, x)).ToArray();
    Task.WaitAll(tasks);
}

private Task RetrieveDocument(List<FileHostResult> results, DocumentIdentifier x)
{
    return Task.Run(() =>
    {
        var result = GetFileHostResultFromFileHost(x.ExternalIdentifier);
        lock (results)
        {
            results.Add(result);
        }
    });
}

My question is whether or not there is a better way of doing this, or if there are any potential pitfalls I might run into? (eg. locking server resources, etc).

EDIT 1: I didn't post the code for GetFileHostResultFromFileHost because I don't really have any access to change it. Its basically a method call implemented in a library I cant change.

EDIT 2: To clarify. My main concern is to avoid harming the current user experience on the site. To that end I want to make sure that running tasks concurrently out of an ASP.net mvc isn't going to lock up the site.

Cœur
  • 37,241
  • 25
  • 195
  • 267
  • sorry @mjwills, the results bid was just a copy paste issue on my part. I cant really show GetFileHostResultFromFileHost, but its basically just a web call to a web url that returns a byte array. – matt.rothmeyer Jun 08 '18 at 21:53
  • `but its basically just a web call to a web url that returns a byte array.` You probably don't need to use `Task.Run` (or `AsParallel`) then since you are likely spinning up threads to do IO (which is pointless). I can't be more specific without seeing the code. – mjwills Jun 08 '18 at 21:55
  • Also see https://stackoverflow.com/questions/1361771/max-number-of-concurrent-httpwebrequests . – mjwills Jun 08 '18 at 21:56
  • Well the issue really is that when I make multiple requests to the file host concurrently, the file host is able to return the results significantly faster than if I queue up one request at a time. The file service is hosted by a provider, so I don't know if they're doing some kind of distributed thing, or running the request on multiple machines, but I know that in this case it doesn't seem pointless – matt.rothmeyer Jun 08 '18 at 22:02
  • You have done the file downloads synchronously and they are slow. You have added threads to run them in parallel - faster. But if you run them asynchronously (without the additional overhead of threads) - even faster. See https://stackoverflow.com/questions/19389938/c-sharp-download-data-from-huge-list-of-urls and https://stackoverflow.com/questions/19383910/paralell-foreach-with-httpclient-and-continuewith . – mjwills Jun 08 '18 at 22:03
  • Possible duplicate of [Paralell.ForEach with HttpClient and ContinueWith](https://stackoverflow.com/questions/19383910/paralell-foreach-with-httpclient-and-continuewith) – mjwills Jun 08 '18 at 22:05
  • Thanks for the suggestions @mjwills, unfortunately I don't know if I'm going to be able to access the calls at that level to make them using async. I'm using an library that I don't really have the source to, and I don't know if my PM is going to sign off on me rewriting it. – matt.rothmeyer Jun 08 '18 at 22:26

2 Answers2

1

You should use Microsoft's Reactive Framework for this. It is ideally suited to this kind of processing.

Here's the code:

IObservable<FileHostResult> query =
    from i in identifiers.ToObservable()
    from r in Observable.Start(() => GetFileHostResultFromFileHost(i.ExternalIdentifier))
    select r;

IList<FileHostResult> results = query.ToList().Wait();

That's it. It properly schedules the code on the optimum number of threads.

If you want awaitable code then you can do this:

IObservable<FileHostResult> query =
    from i in identifiers.ToObservable()
    from r in Observable.Start(() => GetFileHostResultFromFileHost(i.ExternalIdentifier))
    select r;

IList<FileHostResult> results = await query.ToList();

It's really very simple and easy to code.

NuGet "System.Reactive" and then add using System.Reactive.Linq; to your code.

Enigmativity
  • 113,464
  • 11
  • 89
  • 172
0

It is hard to give great advice without seeing the rest of the source code. But based on what I can see I'd suggest an approach like:

private void RetrieveDocuments(DocumentIdentifier[] identifiers, List<FileHostResult> results)
{
    results.AddRange(identifiers.AsParallel().Select(x => RetrieveDocument(x)));
}

private FileHostResult RetrieveDocument(DocumentIdentifier x)
{
    var result = GetFileHostResultFromFileHost(x.ExternalIdentifier);
    return result;
}

The advantages of this approach:

  • No explicit use of Task.Run - let AsParallel take care of that for you.
  • No need for locking the results list - let AsParallel and Select take care of that for you

You may also wish to increase the maximum number of connections you have access to.

Being honest though, I think you should look at approaches that don't require new Tasks at all - likely by using Async http download calls which you can run in parallel without the overhead of a thread.

mjwills
  • 23,389
  • 6
  • 40
  • 63
  • Thanks! My main concerns with this are more around how running concurrent tasks may affect the performance of the site, and if ASP.Net provides some built in way to do concurrent tasks that is safe – matt.rothmeyer Jun 08 '18 at 21:57