why salt-cloud is so slow comparing to terraform?

Question

I'm comparing salt-cloud and terraform as tools to manage our infrastructure at GCE. We use salt stack to manage VM configurations, so I would naturally prefer to use salt-cloud as an integral part of the stack and phase out terraform as a legacy thing.

However my use case is critical on VM deployment time because we offer PaaS solution with VMs deployed on customer request, so need to deliver ready VMs on a click of a button within seconds.

And what puzzles me is why salt-cloud takes so long to deploy basic machines.

I have created neck-to-neck simple test with deploying three VMs based on default CentOS7 image using both terraform and salt-cloud (both in parallel mode). And the time difference is stunning - where terraform needs around 30 seconds to deploy requested machines (which is similar to time needed to deploy through GCE GUI), salt-cloud takes around 220 seconds to deploy exactly same machines under same account in the same zone. Especially strange is that first 130 seconds salt-cloud does not start deploying and does seemingly nothing at all, and only after around 130 seconds pass it shows message deploying VMs and those VMs appear in GUI as in deployment.

Is there something obvious that I'm missing about salt-cloud that makes it so slow? Can it be sped up somehow? I would prefer to user full salt stack, but with current speed issues it has I cannot really afford that.

After salt-cloud provision VM, it also try to install salt-minion to the target VM and attempt to configure the master-minion. In addition, did you try to use the `-P` parallel switch to create all three minion in parallel than waiting them in sequence? Terraform doesn't have the minion configuration part, that will save time. — mootmoot, Jul 11 '16 at 07:26
I've measured time from start till end of hosts bootstrap. salt-cloud tells you in console when initial machine bootstrap is over and it moves on to provisioning, i.e. minion installation, master-minion connection and certificates, further provision according to salt states etc. So that is excluded from the timeline I've explained. Also yes, I've used it with --parallel flag. TBH I don't understand why this flag exists, as for me it should be parallel by default. — alexykot, Jul 12 '16 at 10:08

score 4 · Answer 1 · answered Jul 02 '17 at 12:57

Note that this answer is a speculation based on what I understood about terraform and salt-cloud, I haven't verified with an experiment!

I think the reason is that Terraform keeps state of the previous run (either locally or remotely), while salt-cloud doesn't keep state and so queries the cloud before actually provisioning anything.

These two approaches (keeping state or querying before doing something) are needed, since both tools are idempotent (you can run them multiple times safely).

For example, I think that if you remove the state file of Terraform and re-run it, it will assume there is nothing in the cloud and will actually instantiate a duplicate. This is not to imply that terraform does it wrong, it is to show that state is important and Terraform docs say clearly that when operating in a team the state should be saved remotely, exactly to avoid this kind of problem.

Following my line of though, this should also mean that if you either run salt-cloud in verbose debug mode or look at the network traffic generated by it, in the first 130 secs you mention (before it says "deploying VMs"), you should see queries from salt-cloud to the cloud provider to dynamically construct the state.

Last point, the fact that salt-cloud doesn't store the state of a previous run doesn't mean that it is automatically safe to use in a team environment. It is safe to use as long as no two team members run it at the same time. On the other hand, terraform with remote state on Consul allows for example to lock, so that team concurrent usage will always be safe.

why salt-cloud is so slow comparing to terraform?

1 Answers1