Our issue is that Azure App Service (S3 x 5 Instances) is not evenly distributing requests across the 5 instances. The result is that one instance is getting swamped with requests and our overall P50 & P95 response time SLA for that app service is being breached.
I've confirmed that the App Service has ARR Affinity turned off. It's a completely stateless web API so there's nothing inherently sticky about it.
Tech details below but the question is essentially this
Why isn't Azure evening distributing/round-robin-ing my traffic across all 5 instances?
As it stands, scaling up or out doesn't seem to make sense here because I just end up with additional expensive instances sitting idle while 1 instance gets swamped.
Technical Details
The following 2 charts from app insights, from June 1st & June 25th show the issue.
requests
| where timestamp > datetime("2020-06-25 00:00:00")
| where timestamp < datetime("2020-06-25 08:00:00")
//comaprison between 00:00-08:00 on June 1st vs. Today
| where url contains "**ommitted**"
| project cloud_RoleInstance, itemCount, bin(timestamp, 1h)
| evaluate pivot(cloud_RoleInstance, sum(itemCount))
| render timechart
This first image below shows the traffic distribution on June 1st. not perfectly distributed but close. the 3rd server is taking on about 50% more traffic than the 5th server
34,708 26,436 38,313 30,617 24,355
22% 17% 25% 20% 16%
This next image below shows the traffic distribution for the same time frame this morning... The 4th instance is handling 250% more traffic than the next closest instance and 600% more than instance 1
11,980 21,671 34,180 85,041 24,508
7% 12% 19% 48% 14%