I set up a Cassandra cluster of two DCs, one in East US and one West US. There is not VPN/gateway. Every time I restart the whole cluster, nodetool describecluster reports everything normal, and a long read with consistent level ALL works fine.
However, after a few minutes, the "describeculster" on each node shows a UNREACHABLE: [xxx.xxx, ...], the unreachable set increase gradually until the two DCs are not reachable to each other. and the read request fail on ReadTimeoutException.
But any run of "nodetool status" on each node reports all nodes Up & normal. And ssh to the node and ping the other DC works fine, too.
Increasing read_request_timeout_in_ms, tuning GC does not solve the problem.
Any ideas why this can happen?
casandra.yaml: listen_address: {{private_ip}} rpc_address: 0.0.0.0 broadcast_rpc_address: {{public_ip}} broadcast_address: {{public_ip}}