6

Cassandra version 1.2.9. Five node cluster, but one of the nodes is down with hardware failure and repair/replacement ETA is unknown. I want to decommission/remove the down node (the notifications are cluttering all logs). nodetool removenode seems to be perfect, except it requires a host ID. The down node has no host ID (listed as null in status):

It appears that removetoken is no longer an option for nodetool.

What is the proper way to remove this dead node?

Sean Durity
  • 81
  • 1
  • 5
  • I was going to give you a link to DataStax's "replacing a dead node" documentation, but the last step is to do a `nodetool removenode` which you have stated won't work for you. Have you tried looking at `nodetool move`? In theory you could bring-up the replacement node (with the initial_token set to the dead node's, minus 1) and then move it to the desired token range. – Aaron Dec 12 '13 at 17:44
  • 1
    I do not have a replacement server, yet. I tried removetoken; that is deprecated in this version. I tried removenode; but there is no host ID, so it fails. I tried move and it failed to have enough streaming sources. Next up was a JMX command: Entering shell mode. % jmx_invoke -m org.apache.cassandra.net:type=Gossiper unsafeAssassinateEndpoint . That failed with a null pointer exception. Still working on this. – Sean Durity Dec 12 '13 at 21:34
  • 6
    The following steps DID work: - 1) On each remaining node, add -Dcassandra.load_ring_state=false to the JVM_OPTS in your cassandra-env.sh file and restart. - 2) Run nodetool status again and confirm that the unwanted node is gone. - 3) On each node, run delete from system.peers where peer = '[ip address of dead node]'; via cqlsh - 4) Remove -Dcassandra.load_ring_state=false from cassandra-env.sh on each node and restart. – Sean Durity Dec 19 '13 at 20:13

1 Answers1

0

The question is for older version of cassandra. But the official solution should apply for any version of cassandra - https://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsGossipPurge.html

The steps there basically tells to stop the cluster, delete the sstables related to peering, clear the gossip state and restart the cluster; which makes sense.

Manojkumar Khotele
  • 963
  • 11
  • 25