1

I am creating a customized implementation handler for HTTP requests that should execute time consuming (50ms per request) code in parallel. I don't need to return anything to the user so my only concern is fast execution on a separate CPU core. This is the Configure implementation:

public void Configure(IApplicationBuilder app){
  app.Run(context => {
    return Task.Run(async () => {
      await executeHandle(context);
    });
  });
}

On every request, executeHandle is called.

private static readonly Object obj = new Object();
public HashSet<string> arrayOfStrings = new HashSet();

public async Task executeHandle(HttpContext context){
  if (context.Request.Body != null){
    using (var ms = new MemoryStream()){
      await Microsoft.AspNetCore.Http.Extensions.StreamCopyOperation.CopyToAsync(
          context.Request.Body, ms, s_maxInMemoryData, context.RequestAborted);
      var requestBody = ms.ToArray();
      string html = Encoding.UTF8.GetString(responseBody);
      // inspect arrayOfStrings HashSet on each request and execude logic based on
      // items found
      // build DOM tree using HTMLAgilePack or some other library, takes 50ms
      // should execute in parallel, not just on a separate thread
      // call BuildDomTree();
    }
  }
}

public static void BuildDomTree(){
  // build DOM and update HashSet
  lock(obj){
   arrayOfStrings.add("somestring")
  }
}

Since performance is critical here I'd like to hear some experts opinion on how to call BuildDomTree. Note that I do not return anything to the user, but still need the result as fast as possible. One option is to use

Task.Run(() => {BuildDomTree()});

The problem with this is that it only executes on different thread, but not necessarily in parallel. The other option is to use Parallel.For and wrap it in Task.Run to avoid blocking:

Task.Run(() => {Parallel.For(0, 1, BuildDomTree)});

Am I overthinking optimization? Is there a better way to also execute a single function in parallel?

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Robert Segdewick
  • 543
  • 5
  • 17
  • The ASP.NET pipeline is already async, so just `app.Run(context => executeHandle(context));`. Use `await Task.Run(() => BuildDomTree());` so a thread-pool thread executes it rather than an ASP one. Then, even if your server can handle say 10 concurrent active requests, you could still have 100s or 1000s of these requests "running", and the size of the thread-pool would determine how many `BuildDomTree`s actually execute in parallel. `arrayOfStrings` looks like it needs to be static. – sellotape Nov 22 '19 at 18:17
  • 2
    Somewhat related, consider using a concurrent collection to house your `arrayOfStrings`. Something like `ConcurrentBag` would work if you don't specifically need the performance benefits of a `HashSet`. Concurrent collections are thread-safe, meaning you wouldn't need to manually lock the list before adding or reading items. – Patrick Tucci Nov 22 '19 at 21:06
  • Are you sure that `BuildDomTree` is parallelizable? Not all computer problems are [embarrassingly parallel](https://en.wikipedia.org/wiki/Embarrassingly_parallel). – Theodor Zoulias Nov 23 '19 at 12:13
  • Thank you all. Idea is that `BuildDomTree` should run in parallel on different CPU. There is nothing paralleizable about it, just the need to not slow down other threads on the same CPU as executeHandle is running. – Robert Segdewick Nov 23 '19 at 13:40
  • So your concern about `Task.Run(() => {BuildDomTree()});` is that it may invoke the `BuildDomTree` method in a different thread, but not in a different CPU? – Theodor Zoulias Nov 24 '19 at 16:18
  • Yes. Exactly that – Robert Segdewick Nov 24 '19 at 17:39
  • Take a look at this: [How Can I Set Processor Affinity in .NET?](https://stackoverflow.com/questions/2510593/how-can-i-set-processor-affinity-in-net) But do you have any reasons to believe that the operating system doesn't make a good job at distributing threads efficiently to the available CPU cores? – Theodor Zoulias Nov 24 '19 at 23:39

1 Answers1

1

// should execute in parallel, not just on a separate thread

You almost never want parallel code to run on your ASP.NET server. You can very quickly starve your threads, destroying your server's ability to respond to other requests.

But if you are absolutely sure that's what you want to do, then you'll need to make your BuildDomTree parallel itself, preferably using Parallel or Parallel LINQ. Doing Parallel.For(0, 1, ...) is meaningless - it doesn't add any parallelism since there is just one index value in the Parallel.For.

Then there's the question of "fire and forget" on ASP.NET, which is also dangerous. A true "fire and forget" means "I don't care when this finishes, if it finishes, or whether it fails", which is extremely rare. Unless you take steps to notify ASP.NET of your background work, it can be terminated without warning, without exceptions, and without logs.

Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810