0

I've changed log_driver to "local" in daemon.json docker configuration file, because an high activity level on rados gateway logs had satured disk space. My will was to change to journald to have logrotate. Unfortunately, after restart the docker daemon, many Ceph services did disappeared even as containers images. So now that node had caused an HEALTH_ERR because it lost 1 mgr, 1 mon and 3 osd services at the same time. I've tried to use some ceph commands inside cephadm shell (on another node), but it freezes and nothing happened. What can I try to do to restore the node's services and cluster health?

Odysseo
  • 1
  • 1
  • HEALTH_ERR should not occur of only one node goes down. How exactly is your cluster configured (`ceph osd df tree`, `ceph -s`). You can try to restart the failed daemons with `systemctl reset-failed ceph-@mgr.` and then `systemctl start ceph-@mgr.` and the same for the other failed services. You can watch the logs with `journalctl -fu ceph-@mgr.` to see what goes wrong if it still fails. – eblock Aug 05 '22 at 10:58
  • Sorry, you're right, I gave very little information. Anyway, in the meantime, I managed to restore everything with a simple reboot. – Odysseo Aug 08 '22 at 19:00

0 Answers0