8

I have made an app that uses azure functions durable fan-out strategy to make parallel inquiries and updates to a database by sending http requests to our own internal API.

I found out that the fan-out strategy is EXTREMELY slower than it would be to use TPL library and doing parallelism this way on a normal .net Core webapp. It's not just slower it's about 20 times slower. It takes 10 minutes for 130 updates while the .net core 3.1 app that I've made for speed comparisson that does the exact same thing does 130 updates in 0.5 minutes and in a significantly lower plan.

I understand there is a latency becuase of the durable framwork infrastructure (communicating with the storage account and whatnot) but I don't see how that speed difference is normal. Each individual update happens in an ActivityTrigger function and the orchestrator is the one that gathers all the necessary updates and puts them in a Task.WhenAll() call just like the example from Microsoft docs.

Am I doing something wrong here? Is this business scenario maybe not compatible with this technology? The code seems to work fine and the parallelism works It's just a LOT slower than the .net core app. Another thing to mention is that the moment the function opens a second instance (due to either it being in consuption plan and naturally openning a second instance to deal with heavy load or it being in appservice plan and I manually opening an instance) it goes even slower although the cpu load somehow balances in the two instances. I suspect this could be extra latency due to azure queue comunication between the two instances but I'm not entirely sure.

One last detail is that the app also has a TimeTrigger that does a simple select in a database every one minute (nothing even remotely cpu intensive but it might play a role in the performance).

I've tried the function app in a premium plan, consuption plan, and appservice plan and it seems to top at 130 updates in 10 minutes no matter how huge the plan is.

  • What type of database are you updating? (SQL Server, Cosmos DB, .. ?) – Preben Huybrechts Sep 21 '20 at 09:10
  • @PrebenHuybrechts It's a simple MSSQL database, extremely lightweight with hardly any data in it so far. The project is in development. – Periklis Panagiotis Arnaoutis Sep 21 '20 at 10:44
  • are you using SQL Database? (SQL Server PaaS) ? How many DTUs have you assigned to it? – Thiago Custodio Sep 21 '20 at 13:48
  • @ThiagoCustodio No I'm not I'm using MSSQL with Entity framework. The database operations don't really make any difference since they're minimal (just once a minute). I'm using the lowest possible DTU on the database. The job that takes time though isn't using database operations , it's just making a ton of API calls. Only the time trigger is doing one sql database operation per minute. – Periklis Panagiotis Arnaoutis Sep 21 '20 at 15:05
  • you're wrong. First you can't compare local performance with cloud one. Just to reach your database it pass through multiple hops which increase the expected time to execute the function. Second, DTU is actually which will increase the performance of this process. I'm 100% sure if you increase you'll get better results. Also, try writing the update statement rather than EF generated one. – Thiago Custodio Sep 21 '20 at 15:47
  • @ThiagoCustodio Following your advice I did that. Nothing changed at all. I used the 126 Euros plan instead of the 4 Euros plan on azure and still the exact same result. This Time Trigger function is a completely different function that has nothing to do with the load that I'm interested in. It just in the same function app that's why I mentioned it. The business I'm talking about is in another Http function. And the SQL database of the API is actually only in a 12 Euro plan although with that same db in the .net core 3.1 app when I call the api I get 130 updates in 0.5 minutes no problem ! – Periklis Panagiotis Arnaoutis Sep 22 '20 at 08:10
  • Well ... to me the problem was small amount of capacity allocated to your SQL. I guess you'll need to troubleshoot and findout what the problem is. Not sure what other advise I could give at this moment – Thiago Custodio Sep 22 '20 at 13:35
  • @ThiagoCustodio Thank you for your input. Do you know of any performance benchmarks that use azure function durable framework to do huge parallel async workloads using the fan-out strategy? I'd like to compare my code to some kind of benchmark that's made to showcase this functionality but I can't find any online. – Periklis Panagiotis Arnaoutis Sep 22 '20 at 14:29
  • I guess you'll need to add some trace to your logs, then you can figure out which part is the botleneck. Also, maybe using Application Insights can give you some help too – Thiago Custodio Sep 22 '20 at 15:01
  • @ThiagoCustodio I've done all that I've seen the times through appinsights and custom loggin I event put file system logging just to be sure at some point. It all shows that the durable framwork concurrency is just that much slower than expected. I'm slowly drawing the conclusion that my business requirement was just never good for this technology. – Periklis Panagiotis Arnaoutis Sep 22 '20 at 15:28
  • I agree that it seems off for it to take that long with functions even with the overhead. The expected maximum throughput for the fan out-fan in pattern is about 100 action items per second so the framework should be able to handle it. https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-perf-and-scale#performance-targets – PerfectlyPanda Sep 23 '20 at 01:13
  • A couple of things to look into: In similar scenarios I've found that the scale controller doesn't always spin up enough instances to cover a burst of requests (like 130 activity functions) when the app is new. Over time it does learn the pattern and add workers more quickly. Another thing to look at with durable functions specifically is your storage account. Because durable functions interacts with storage so much it is possible to get throttled from that side. A third scenario might be getting throttled at the database if you aren't pooling connections. – PerfectlyPanda Sep 23 '20 at 01:16
  • @SamaraSoucy-MSFT Thank you for your input, if that's the case and the worker throttles itself until it's wormed up then it takes a very long time to do so. I have tested for hours on end (more than 8 hours) and seen the same result. We have finally chosen to move the project in a .net core 3 api instead of the azure durable function and this is working extremely better for us. I'm closing the issue here so people can have it as a test case in case they run into the same problems I had. – Periklis Panagiotis Arnaoutis Sep 29 '20 at 07:04
  • @PeriklisPanagiotisArnaoutis I'm the PM of Durable Functions. Can you explain what the app is doing in the activity function (also what are the input and output) and how long a single execution of the activity function is expected to take? I don't think SQL performance is the issue here. As you mentioned, this could be caused by the overhead added by the Storage operations. With more information, I can help you understand whether Durable Functions is a good fit and whether there are things you can do to make it perform better. – Anthony Chu Sep 29 '20 at 19:24
  • @PeriklisPanagiotisArnaoutis Also want to understand the machine that you're testing your .net core app on. How many CPUs does it have? – Anthony Chu Sep 29 '20 at 19:26
  • @AnthonyChu Thanks for taking the time to respond to this. The app is sending http requests to an API in order to create items. It also uploads an image, again by sending to a second http API. I have tried scenarios with and without the image upload and I don't see a huge difference. I have a sub orchestrator function that executes the signle http call through a single Activity trigger and I use the fan-out strategy to populate all the threads with the call of the sub orchestration, each with different model of course. – Periklis Panagiotis Arnaoutis Sep 30 '20 at 11:09
  • @AnthonyChu I also use Durable Entities in order to save state information within the sub-orchestrator to use later on in a report Activity because I couldn't figure out a way to do it in memory in the base-orchestrator and it generally didn't seem like best practise after reading all the durable framwork documentation. The machine of my .net core app uses the S1 plan of azure functions it's 100 ACUs and it has the significantly greater performance. I explain that in my answer to Chris Gillum and also expand on the reason I chose Durable framework. – Periklis Panagiotis Arnaoutis Sep 30 '20 at 11:15

1 Answers1

4

Speaking generally, TPL will almost always be much faster than Durable Functions because all the coordination is done in-memory (assuming to don't completely exhaust system resources doing everything on one machine). So that part is often expected. Here are a few points worth knowing:

  • Each fan-out to an activity function involves a set of queue transactions: one message for calling the activity function and one message for handing the result back to the orchestrator. When there are multiple VMs involved, then you also have to worry about queue polling delays.
  • By default, the per-instance concurrency for activity functions is limited to 10 on a single-core VM. If your activity functions don't require much memory or CPU, then you'll want to crank up this value to increase per-instance concurrency.
  • If you're using the Azure Functions Consumption or Premium plans, it will take 15-30 seconds before new instances get added for your app. This matters mainly if your workload can be done faster by running on multiple machines. The amount of time a message spends waiting on a queue is what drives scale-out (1 second is considered too long).

You can find more details on this in the Durable Functions Performance and Scale documentation.

One last thing I will say is the key value add of Durable Functions is orchestrating work in a reliable way in a distributed environment. However, if your workload isn't long-running, doesn't require strict durability/resilience, doesn't require scale-out to multiple VMs, and if you have strict latency requirements, then Durable Functions might not be the right tool. If you just need a single VM and want low latency, then a simple function that uses in-memory TPL may be a better choice.

Chris Gillum
  • 14,526
  • 5
  • 48
  • 61
  • I have tried to add more concurrency from the default 10, I noticed no improvement, the app opened a second instance and from then on it was even slower than having one instance. Your points seem to confirm my suspicions. The reason I chose durable framework is because this job is going to be long running. I wanted to have a max running cap of 2 hours. But I need maximum performance in those hours. In the core web api that I made to replace the functions technology in one hour I make up to 9000 or more with database updates. In durable this takes more than 10 hours (I stopped at that point). – Periklis Panagiotis Arnaoutis Sep 30 '20 at 10:57