I have a nearly empty three node cassandra cluster using v2.0.9, and ran nodetool repair
on one node
$ nodetool repair
[2014-10-11 13:11:17,862] Starting repair command #1, repairing 768 ranges for keyspace myks
This seems to hang forever (at least a few hours). However, if I look in /var/log/cassandra/system.log
, I see the following:
$ date
Sat Oct 11 15:55:02 PDT 2014
$ grep 'session completed successfully' /var/log/cassandra/system.log | head -5
INFO [AntiEntropySessions:1] 2014-10-11 13:11:21,023 RepairSession.java (line 282) [repair #c281ead0-5182-11e4-a96f-63ead2d0f3f0] session completed successfully
INFO [AntiEntropySessions:2] 2014-10-11 13:11:23,544 RepairSession.java (line 282) [repair #c46020b0-5182-11e4-a96f-63ead2d0f3f0] session completed successfully
INFO [AntiEntropySessions:3] 2014-10-11 13:11:25,631 RepairSession.java (line 282) [repair #c5df46a0-5182-11e4-a96f-63ead2d0f3f0] session completed successfully
INFO [AntiEntropySessions:4] 2014-10-11 13:11:32,216 RepairSession.java (line 282) [repair #c71ef290-5182-11e4-a96f-63ead2d0f3f0] session completed successfully
INFO [AntiEntropySessions:1] 2014-10-11 13:11:38,950 RepairSession.java (line 282) [repair #cb0b9610-5182-11e4-a96f-63ead2d0f3f0] session completed successfully
$ grep 'session completed successfully' /var/log/cassandra/system.log | tail -5
INFO [AntiEntropySessions:3] 2014-10-11 13:30:49,527 RepairSession.java (line 282) [repair #7b9e2720-5185-11e4-a96f-63ead2d0f3f0] session completed successfully
INFO [AntiEntropySessions:2] 2014-10-11 13:30:51,671 RepairSession.java (line 282) [repair #7cdbd740-5185-11e4-a96f-63ead2d0f3f0] session completed successfully
INFO [AntiEntropySessions:4] 2014-10-11 13:30:56,139 RepairSession.java (line 282) [repair #7e232450-5185-11e4-a96f-63ead2d0f3f0] session completed successfully
INFO [AntiEntropySessions:1] 2014-10-11 13:30:58,633 RepairSession.java (line 282) [repair #80ccc080-5185-11e4-a96f-63ead2d0f3f0] session completed successfully
INFO [AntiEntropySessions:3] 2014-10-11 13:31:01,744 RepairSession.java (line 282) [repair #82497570-5185-11e4-a96f-63ead2d0f3f0] session completed successfully
$ grep 'session completed successfully' /var/log/cassandra/system.log | wc -l
357
This suggests that nodetool repair
was moving swiftly along, completing the first 357 out of 768 (? I assume) repairs in 20 minutes and then became hung for some reason.
How can I diagnose this? Nothing immediately jumps out at me in the logs, but there's a lot of information in them.