4

I've got a few nodes in a ring with replication 3 and trying to change the hardware on the node. What's happening is that I'm getting a streaming failure exception.

I've tried a few times always with the same failure. The upstream node (10.0.10.54) is dreadfully out of space and it's not realistic to compact or do any sstable operations on it. What I would like to do is:

  1. Bring up a new node with all the data prior streaming prior to the failed event
  2. Run a repair on it (nodetool repair -pr)
  3. Decomission the 10.0.10.54 node

What I can't figure out how to do is everytime I bring up the new node it goes into JOINING, what I want is to force it into RUNNING with the data that it has copied from it's JOINING state.

The exception for those interested -

WARN  [StreamReceiveTask:6] 2016-04-25 06:48:51,107  StreamResultFuture.java:207 - [Stream #bb34c010-0a1b-11e6-a009-d100b9716be2] Stream failed
INFO  [MemtableFlushWriter:214] 2016-04-25 06:48:51,107  Memtable.java:382 - Completed flushing /mnt/cassandra/data/system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/system-compactions_in_progress-tmp-ka-276-Data.db (0.000KiB) for commitlog position ReplayPosition(segmentId=1461502431578, position=9474892)
INFO  [CompactionExecutor:259] 2016-04-25 06:48:51,252  CompactionTask.java:141 - Compacting [SSTableReader(path='/mnt/cassandra/data/trends/stream_trends-a5bb42a07e2911e58fd6f3cfff022ad4/trends-stream_trends-ka-79-Data.db'), SSTableReader(path='/mnt/cassandra/data/trends/stream_trends-a5bb42a07e2911e58fd6f3cfff022ad4/trends-stream_trends-ka-87-Data.db')]
ERROR [main] 2016-04-25 06:48:51,270  CassandraDaemon.java:581 - Exception encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed
        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1166) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:944) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:740) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:617) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:389) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:336) ~[dse-core-4.8.6.jar:4.8.6]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:564) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at com.datastax.bdp.DseModule.main(DseModule.java:74) [dse-core-4.8.6.jar:4.8.6]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
        at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1172) ~[guava-16.0.1.jar:na]
        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.1.jar:na]
        at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-16.0.1.jar:na]
        at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-16.0.1.jar:na]
        at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-16.0.1.jar:na]
        at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:692) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:653) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:179) ~[cassandra-all-2.1.13.1218.jar:2.1.13.1218]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_77]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_77]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_77]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_77]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
koblas
  • 25,410
  • 6
  • 39
  • 49
  • How many data are there in total on the cluster and on each node ? Maybe just kill the new node (reset the keyspace data without touching to the system keyspaces) and try joining again. Also look at the value of the streamingthroughput (nodetool getstreamingthroughput) – doanduyhai Apr 28 '16 at 07:17
  • streamingthrougput is 0 The problem is that the file from 10.0.10.54 causes the node to crash. – koblas Apr 28 '16 at 16:34
  • I can see that you're using datastax enterprise in the datastax program. Contact datastax people, I'm sure they can help – doanduyhai Apr 28 '16 at 18:58

1 Answers1

0

You can't skip the JOINING stage while a node is bootstrapping. I'm guessing you were following these steps to replace the node? https://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_replace_node_t.html

Were all of your nodes online throughout the entire streaming process? If a replica crashes or goes offline while streaming it can cause the stream to fail. If you nodes are very low on disk space it can cause Cassandra to act in odd ways or crash. If this is the case, you may need to add additional storage to your existing node before adding the new node.

You can add more disk space to your existing node like this:

  1. Stop Cassandra
  2. Attach a larger disk to the machine/VM
  3. Copy the cassandra data directory (/var/lib/cassandra/data) to the new disk
  4. Change the cassandra data directory mount point to the new disk using a symlink
  5. Start Cassandra up
Justin Cameron
  • 611
  • 4
  • 8