0

I have a CephFS Octopus system running with two active meta data servers (MDS) and seven in standby for any failures. The two active MDS run on more up-to-date machines with more RAM and CPU power, while the backup MDS are on older systems.

Of the backup MDS, one is preferred to take over (reasons do not matter, only that it has good hardware capabilities). How can I set an order in which the backup deamons take over when an active MDS fails? Is there even such a possibility?

I found no options in the documentation and have been searching for a while now already; the search results all link me to the general MDS setup.

emil
  • 194
  • 1
  • 11
  • I'm not aware of any mechanism to tell ceph in which order the standby daemons are activated. What you could do (although it's not always recommended) is to set `allow_standby_replay` to `true`. This would assign two daemons as "hot standby" daemons for each of the active daemons. If those are not the ones you prefer, stop them and other daemons will take over. After your desired daemon is standby, you can start the other again. One question remains though, why use so many standby daemons? Are your daemons crashing often? One could argue that one standby daemon would be enough. – eblock Nov 15 '22 at 11:02
  • If one active daemon crashes, the standby takes over. In the meantime you need to fix why it crashed, bring it back online and then it is a standby again. If both active daemons crash without a standby it would be bad, of course. How likely is that, do you experience regular crashes? But for such a case you could have 2 standby daemons, that should be usually enough. Or do you have other requirements I'm forgetting? – eblock Nov 15 '22 at 11:04
  • @eblock , thanks for pointing me to the `allow_standby_replay` parameter, this comes close to what I've been looking for. You ma write a full answer and I'll accept it. – emil Nov 16 '22 at 15:50

1 Answers1

1

What you could do (although it's not always recommended and depends on your actual use-case) is to set allow_standby_replay to true. This would assign two daemons as "hot standby" daemons for each of the active daemons. If those are not the ones you prefer, stop them and other daemons will take over. After your desired daemon is standby, you can start the other again. If one active daemon crashes, the standby-replay daemon takes over. In the meantime you need to fix why it crashed, bring it back online and then it is a standby again.

eblock
  • 579
  • 3
  • 5