2

In an Asp.NET MVC application I have a project with a graph of activities to track.

A project doesn't have a single root but multiple ones. Every tree could be complex and deep and every node depends on the others on things like dates and fine grained user permissions.

I need to process all the project graph every time I do an operation on a node because even different branches depends on each other.

The structure is stored flat in a SqlServer DB.

To create the tree I have a recursive function that do a lot of things to create some data for every node (in the context of the current user).

For example I have a project with 3000 nodes that takes more than 2 seconds to process with a single call that create the entire graph.

public static List<Nodes> GetProject(...) {
  var list = new List<Nodes>;
  CreateTreeRecursive(...);
  return list;
}

Remember that I have multiple roots. This let me to parallelize the work and process every branch indipendently.

If I parallelize the execution with Task.Run or Parallel.ForEach the time to create the entire graph is in the range between 15 and 50 ms, 50 times faster.

public static List<Nodes> GetProject2(...) {

  var list = new List<Nodes>;

  Parallel.ForEach(...,
    (root) => {
      ...
    });

  return list;
}

The bad news is that you shouldn't create threads in ASP.NET.

In the specific case I don't have many concurrent users, but with an audience of ~200 users you can't really know for sure.

The other thing is that the roots in a project could be many, up to 100, so many threads would be created.

This solution would be so simple but is inapplicable.

Is there some way to do this in a simple manner or my only option is to offload the work to some external service that can span multiple threads and waiting asyncronously?

If this is the case I would appreciate some suggestions?

To be clear this is an operation that is made for any user interaction on the project. I can't cache the result, is too volatile. I can't enqueue somewhere and eventually get the result.

Thanks

sevenmy
  • 437
  • 2
  • 6
  • 13
  • Usually problems like this are handled by caching, is that feasible with your data source? You may not be able to cache a entire tree but subsets of the tree may be cache-able and not need to be recalculated. – Scott Chamberlain Jun 29 '16 at 14:19
  • Do the users really need to know the state of 3000 individual nodes? If they're different branches and they depend on each other, they don't really feel like they're different branches to me. Based on what you're saying it's impossible for us to solve your problem and at the best we can give broad general advice which isn't really useful to solving problems because we don't know what you know. – George Stocker Jun 29 '16 at 14:19
  • @ScottChamberlain Yes, I could cache the branches but if the edits are frequent I risk to have very short living cache entries. Every minimal piece of data changed means an invalidation of the branch. – sevenmy Jun 29 '16 at 14:42
  • @GeorgeStocker A branch can depends on another only for the starting date. Most of the time users don't need the entire picture, but sometimes is necessary. – sevenmy Jun 29 '16 at 14:42
  • I start to admit (I already thinked about it) that I need to cache something, but I am really interested in the parallelization problem in ASP:NET, would be the simplest solution. – sevenmy Jun 29 '16 at 14:45

1 Answers1

1

The bad news is that you shouldn't create threads in ASP.NET.

This is not true and this wrong assumption is blocking the right solution.

You can create threads. The risk that you probably have in mind is that you might exhaust the capacity of the thread pool. This is not easy to do in general.

Your threads are CPU bound. This means that your server is totally overloaded long before the pool is exhausted. Pool capacity is not your limiting factor.

With some assumptions we can make up a concrete scenario: An 8 core server is saturated at 8 threads (that are runnable like here). But the thread pool would not be considered overloaded if there are less than 100 threads. (The actual number varies. 100 should be safe in a wide range of cases.)

Further, Parallel.ForEach uses pool threads. It does not create a meaningful amount of threads. It also does not occupy one thread per input item.

I don't see anything here to worry about.

usr
  • 168,620
  • 35
  • 240
  • 369
  • This would be fantastic. So if I have 100 branches Parallel.ForEach doesn't create 100 threads? And Task.Run? And what is the impact of the number of concurrent users? – sevenmy Jun 29 '16 at 14:52
  • For the first two questions I think you should research that a bit. A lot more than what I can add here has been documented. Regarding concurrent users, this metric is meaningless. Let's talk about concurrent requests. I don't see why parallelism would have an impact here assuming your server is not overloaded on CPU. If you overload it all bets are off but that is, I think, intuitively clear. – usr Jun 29 '16 at 14:56
  • Since your computation is done after 50ms there is not a risk of crowding out other users. – usr Jun 29 '16 at 14:56
  • @sevenmy is the question fully answered for you? – usr Jun 30 '16 at 15:20
  • I'm doing some research and tests. I think I will accept the answer because in my case the solution seems acceptable, the parallelization really improve the performances, and I use it only when the graph is big (uncommon), more than 500 nodes, and in conjunction with caching for read-only scenarios, so shouldn't run for every request. For the sake of the conversation what do you suggest as an alternative approach, an external windows service or azure web job? – sevenmy Jun 30 '16 at 17:09
  • I want to add some links to resources that point out that in general is not the best idea. See: [1](http://blog.stephencleary.com/2013/11/taskrun-etiquette-examples-dont-use.html), [2](http://stackoverflow.com/questions/33764366/is-task-run-considered-bad-practise-in-an-asp-net-mvc-web-application), [3](http://stackoverflow.com/questions/23137393/parallel-foreach-and-async-await). – sevenmy Jun 30 '16 at 17:14
  • Not sure what that would buy you. It's the same computation in a different place but now with added RPC load and latency. If you are concerned with blocking out other requests you can control that in process using a custom TaskScheduler that sets thread priority and restricts the count to cpucount-1. – usr Jun 30 '16 at 17:14
  • 1 is not about parallelism for CPU bound work. It's about async-over-sync. Same for 2. 3 is unrelated. – usr Jun 30 '16 at 17:15
  • Yes the links aren't about parallelization and the use of Task in those examples is clearly wrong, but it's for reference. – sevenmy Jun 30 '16 at 17:19
  • About the external service it can be called really asynchronously. – sevenmy Jun 30 '16 at 17:20
  • Not sure what that buys you since execution flow can only continue after the external service completed. Logically, nothing changes. It's just a different physical call style. Added overhead, added latency, nothing else gained. – usr Jun 30 '16 at 17:24