We recently started having trouble with chef-client dying in the middle of a run after taking a lot more time stuck on various parts of the run-list that normally proceeded much quicker. I've been on my home wifi and my colleague has been on the work wifi, which has been having some connectivity problems of its own.
If your ssh connection gets interrupted to a machine while chef-client is running, does that crash the run in seemingly inexplicable ways? I am using PutTY to connect from my Win7 and my colleague is using the Apple Terminal App.
All the machines we've been running this on are Ubuntu 12.04 (in EC2) and have plenty of disk space left over - they're only utilizing ~1GB with ~5GB free.
Here is the output of the log from /var/log/chef/client.log
(set with the log_location
directive in /etc/chef/client.rb
as described here).
[2014-01-08T00:27:07+00:00] WARN: Nodejs user is nodejs
[2014-01-08T00:27:07+00:00] WARN: Cloning resource attributes for group[nodejs] from prior resource (CHEF-3694)
[2014-01-08T00:27:07+00:00] WARN: Previous group[nodejs]: /var/chef/cache/cookbooks/nodejs/recipes/default.rb:26:in `from_file'
[2014-01-08T00:27:07+00:00] WARN: Current group[nodejs]: /var/chef/cache/cookbooks/spicoli-app/recipes/default.rb:38:in `from_file'
[2014-01-08T00:27:07+00:00] WARN: Cloning resource attributes for user[nodejs] from prior resource (CHEF-3694)
[2014-01-08T00:27:07+00:00] WARN: Previous user[nodejs]: /var/chef/cache/cookbooks/nodejs/recipes/default.rb:34:in `from_file'
[2014-01-08T00:27:07+00:00] WARN: Current user[nodejs]: /var/chef/cache/cookbooks/spicoli-app/recipes/default.rb:46:in `from_file'
[2014-01-08T00:27:30+00:00] WARN: Environment is _default
[2014-01-08T00:27:30+00:00] WARN: Nodejs user is nodejs
[2014-01-08T02:04:54+00:00] ERROR: Running exception handlers
[2014-01-08T02:04:54+00:00] ERROR: Exception handlers complete
[2014-01-08T02:04:54+00:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
[2014-01-08T02:04:55+00:00] ERROR: Input/output error - <STDOUT>
[2014-01-08T02:04:57+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
And the error stacktrace just has this:
Generated at 2014-01-08 02:04:54 +0000
Errno::EIO: Input/output error - <STDOUT>
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/base.rb:91:in `write'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/base.rb:91:in `puts'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/base.rb:91:in `puts'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:61:in `display_section'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:44:in `block (2 levels) in display'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:43:in `each'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:43:in `block in display'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:42:in `each'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/error_descriptor.rb:42:in `display'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/base.rb:130:in `display_error'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/base.rb:161:in `resource_failed'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/formatters/doc.rb:159:in `resource_failed'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/event_dispatch/dispatcher.rb:29:in `block in resource_failed'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/event_dispatch/dispatcher.rb:29:in `each'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/event_dispatch/dispatcher.rb:29:in `resource_failed'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource.rb:637:in `rescue in run_action'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource.rb:643:in `run_action'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/runner.rb:49:in `run_action'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/runner.rb:81:in `block (2 levels) in converge'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/runner.rb:81:in `each'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/runner.rb:81:in `block in converge'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection.rb:98:in `block in execute_each_resource'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection/stepable_iterator.rb:116:in `call'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection/stepable_iterator.rb:116:in `call_iterator_block'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection/stepable_iterator.rb:85:in `step'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection/stepable_iterator.rb:104:in `iterate'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection/stepable_iterator.rb:55:in `each_with_index'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/resource_collection.rb:96:in `execute_each_resource'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/runner.rb:80:in `converge'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/client.rb:433:in `converge'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/client.rb:500:in `do_run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/client.rb:199:in `block in run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/client.rb:193:in `fork'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/client.rb:193:in `run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/application.rb:208:in `run_chef_client'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/application/client.rb:312:in `block in run_application'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/application/client.rb:304:in `loop'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/application/client.rb:304:in `run_application'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/lib/chef/application.rb:66:in `run'
/opt/chef/embedded/lib/ruby/gems/1.9.1/gems/chef-11.8.0/bin/chef-client:26:in `<top (required)>'
/usr/bin/chef-client:23:in `load'
/usr/bin/chef-client:23:in `<main>'
Which is a really generic error! But it does seem to indicate an interruption to STDOUT output, which kind of makes sense with a client disconnection.
Edit: As requested, here are the contents of the client.rb
file (names obfuscated, naturally.)
$ cat /etc/chef/client.rb
log_level :auto
log_location "/var/log/chef/client.log"
chef_server_url "https://api.opscode.com/organizations/myapp"
validation_client_name "my-validator"
node_name "my-app-node"
Edit 2: Attempt using sudo su -s /bin/bash root -c "screen chef-client"
Screen terminated while I was at lunch and recorded a timeout on the ShellOut
command for npm install
. This was after chef-client was sitting stuck on this operation for over an hour.
[2014-01-09T16:39:07+00:00] WARN: Environment is _default
[2014-01-09T16:39:07+00:00] WARN: Nodejs user is nodejs
[2014-01-09T18:16:28+00:00] ERROR: Running exception handlers
[2014-01-09T18:16:28+00:00] ERROR: Exception handlers complete
[2014-01-09T18:16:28+00:00] FATAL: Stacktrace dumped to /var/chef/cache/chef-stacktrace.out
[2014-01-09T18:16:31+00:00] ERROR: execute[npm-install-app] (spicoli-app::default line 110) had an error: Mixlib::ShellOut::CommandTimeout: command timed out:
---- Begin output of npm --registry http://my.npm.repo.amazonaws.com:5984/registry/_design/app/_rewrite install --cache /home/nodejs/.npm --tmp /home/nodejs/tmp
--- snip: install messages from npm ---
[2014-01-09T18:16:33+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
This is a totally different error than before. The stacktrace.out
file also explicitly mentions ShellOut
, so it is entirely different as well. Most oddly, when I run the same npm command from the command line, in finishes in under a minute.
So I'm not sure there is a way to further diagnose the previous failure, but I would welcome other suggestions. For input on this new failure, I asked this followup question.