4

I'm using apache cassandra 2.2.4. I have a 4 (four) node cluster with Replication Factor 3 in DC1 and Replication Factor 1 in DC2, where DC1 contains 3 (three) nodes and DC2 contains 1 (one) node. There were some more nodes before in this cluster, but for some reason I removed them and didn't altered the replication scenario. [Please consider that the following IP's are not original]

Datacenter: DC1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  21.12.19.91    4.08 GB    256          ?       a45bb676-1ddd-4b22-933b-58653cea680f  RAC1
UN  21.12.19.92    3.92 GB    256          ?       a7735fca-8671-4a20-a759-4a2681aed37e  RAC1
UN  21.12.19.93    4.47 GB    256          ?       d98f3cad-881a-41c8-89c7-170c63c3d236  RAC1
Datacenter: DC2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  21.12.19.99    3.84 GB    256          ?       ccd9ca97-f97a-4473-9a65-49b12a1b60ba  RAC1

Cluster was working fine, but now-a-days I'm having an issue as INFO. I was trying to analyze the issue, but couldn't make it yet. Is there anyone, who is familiar with following scenario?

INFO  [SharedPool-Worker-2] 2017-02-26 06:56:48,520 Message.java:605 - Unexpected exception during request; channel = [id: 0x637a702c, /18.12.10.17:60926 :> /21.12.19.91:9042]
java.io.IOException: Error while read(...): Connection reset by peer
    at io.netty.channel.epoll.Native.readAddress(Native Method) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.doReadBytes(EpollSocketChannel.java:675) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.epoll.EpollSocketChannel$EpollSocketUnsafe.epollInReady(EpollSocketChannel.java:714) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:326) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:264) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) ~[netty-all-4.0.23.Final.jar:4.0.23.Final]
    at java.lang.Thread.run(Thread.java:745) [na:1.8.0_66]
Anower Perves
  • 754
  • 8
  • 15

1 Answers1

0

Please make sure that your Firewall is not dropping TCP connections which are in use. Tcp keep alive on all nodes must be less than the firewall setting. Please refer to https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html for details on TCP settings. This helped me resolve the issue.

  • I've gone through with this from the very beginning of this issue. I've enough **keepalive_time**, **keepalive_probes** and **keepalive_intvl** delay configured. So, there is no possibility to drop. – Anower Perves Mar 02 '17 at 13:40
  • In that case, it might also be because your application or the other nodes have cached IPs of Cassandra nodes that were present previously and now have been decommissioned. Seen from the _INFO_ logs **18.12.10.17:60926** seems to have been removed but still trying to connect to it. Please do a rolling restart of all the nodes and that should resolve the issue. – Vincent Khedkar Mar 06 '17 at 21:43
  • **18.12.10.17:60926** is a client side/ developer side IP. We don't use different block IP's for a single cluster. And also we did rolling restart. But couldn't find out, where the issue is came from. – Anower Perves Mar 07 '17 at 08:35