Puppet splay & splaylimit explained?

Question

I'm looking for someone to explain the usage of splay & splaylimit within Puppet configuration.

The documentation on the Puppet site itself is limited to say the least. I am suffering from thundering herd on my master, i.e. a number of agents hammering the agent for their catalog all at once, to the point where the master falls over, and each agent reports a timeout error.

I know I need to use the splay & splaylimit options in my config to stop all agents checking in at once, but I'm unsure of how to implement it. Can anyone assist please?

score 10 · Answer 1 · answered Oct 02 '15 at 13:43

The splay and splaylimit settings work together with the runinterval setting to help spread out agents' catalog requests in time. They are useful primarily in situations where many machines' agents may be started at once, such as when a bunch of VMs all start up together under control of the same host.

Ordinarily, the agent, when running in daemon mode, starts a catalog run when it first starts up, and again at runinterval intervals. If the splay option is set true then it instead generates a (pseudo-)random delay, not exceeding splaylimit, and delays the start of each catalog run by that amount of time, relative to when it would have started if splaying were disabled.

Thus, if you have a thundering herd problem arising from many agents being started at about the same time, then you could try to address it by setting

splay = true

in your agents' configurations. If you don't configure a specific splaylimit then it defaults to your runinterval, resulting in the catalog runs of all the agents started at the same time being spread more or less uniformly over the whole interval, and therefore over all time going forward.

On the other hand, if your agents' startup is not somehow orchestrated so as to cause them to bunch up, then splaying doesn't really do anything for you. That is, if agent startups are approximately random anyway then it doesn't help you to shift their catalog request cycles.

I think splay can also help when you run the agent in --onetime mode via an external scheduler (e.g. cron). That would present a good use case for the splaylimit setting, because in that case the configured runinterval has nothing to do with when or how frequently the agent runs.

Thanks John, that is my understanding exactly, so I am glad to see someone confirm it. We don't have our runinterval set, so it's at the default of 30mins. Where would you set the splay = true option? I have it in my puppet.conf under the agent section. But the agent runs do not seem to be staggered apart to prevent the thundering herd issue. Does it need to be in the puppet master puppet.conf? If so, where would it go? Under the master section? — LLB3000, Oct 02 '15 at 14:17
@CoolHandLuke, the `splay` option should be set in the `[agent]` section of the agents' `puppet.conf`. Be sure you're working with the right `puppet.conf`, as the directory in which Puppet looks for it varies with the process's user id and command-line options. Restart the agents after making the change. Also, make sure it really is a thundering herd problem, and not a more general server capacity problem. — John Bollinger, Oct 02 '15 at 14:28

Puppet splay & splaylimit explained?

1 Answers1