5

It seems that if you set a Scale Set's overprovision property to true (https://azure.microsoft.com/en-us/documentation/articles/virtual-machine-scale-sets-design-overview/#overprovisioning) you get an invalid SF cluster (the deleted unnecessary VMs appear as bad nodes in the cluster).

Is there a way of making it work? Overprovisioning can really help the deployment succeed when you have multiple VM extensions.

Eli Arbel
  • 22,391
  • 3
  • 45
  • 71

1 Answers1

8

For a service fabric cluster, it is a must that the VMs be allocated across FDs and UDs, (we use an availability set for forcing such an allocation). This topology is then used to elect voter nodes, place system service replicas and also customers service instances/service replicas. When you specify Overprovision = true, basically azure provisions more VMs than what you ask for and then randomly removes extra ones (once the requested number is reached). This results in uneven distribution of VMs and hence possibly a very badly configured cluster.

The reason why you see these deleted unnecessary VMs appear as bad nodes, is because these nodes originally did join the cluster and then were deleted (so the service fabric still thinks that they will eventually come back), we certainly can do an upgrade to fix that issue, but you cannot fix the uneven distribution of nodes.

So - Always set the Overprovision=False in your VMSS deployments.

chacko-AMZN
  • 430
  • 2
  • 6
  • 1
    this should be validated as a part of template validation, what's the point of allowing to deploy potentially broken SF clusters? – illegal-immigrant Jan 24 '17 at 03:34
  • Opened an issue for this https://github.com/Azure/service-fabric-issues/issues/145 – illegal-immigrant Jan 24 '17 at 03:39
  • The official documentation makes it sound like Overprovisioning works correctly for VMSS at this point https://learn.microsoft.com/en-us/azure/virtual-machine-scale-sets/virtual-machine-scale-sets-design-overview#overprovisioning – Marcin Oct 08 '18 at 21:57