41

While doing a bulk load of data, incrementing counters based on log data, I am encountering a timeout exception. Im using the Datastax 2.0-rc2 java driver.

Is this an issue with the server not being able to keep up (ie server side config issue), or is this an issue with the client getting bored waiting for the server to respond? Either way, is there an easy config change I can make that would fix this?

Exception in thread "main" com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
    at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
    at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:187)
    at com.datastax.driver.core.Session.execute(Session.java:126)
    at jason.Stats.analyseLogMessages(Stats.java:91)
    at jason.Stats.main(Stats.java:48)
Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:92)
    at com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(ResultSetFuture.java:122)
    at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:224)
    at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:373)
    at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:510)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:53)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:33)
    at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:165)
    at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
    ... 21 more

One of the nodes reports this at roughly the time it occured:

ERROR [Native-Transport-Requests:12539] 2014-02-16 23:37:22,191 ErrorMessage.java (line 222) Unexpected exception during request
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(Unknown Source)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
    at sun.nio.ch.IOUtil.read(Unknown Source)
    at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Jay
  • 19,649
  • 38
  • 121
  • 184

4 Answers4

44

While I don't understand the root cause of this issue, I was able to solve the problem by increasing the timeout value in the conf/cassandra.yaml file.

write_request_timeout_in_ms: 20000
Jay
  • 19,649
  • 38
  • 121
  • 184
  • I faced the same problem once. I was using `BatchStatement` to write data in Cassnadra. My batch size was 10000. After reducing this batch size, I didn't face the exception. So, maybe you are trying to load to much data into Cassandra in a single request. – abi_pat Sep 24 '15 at 13:25
  • 1
    This is actually very poor choice to do. Did you maybe find out why this was happening because I'm facing the same error now. – iMajna Feb 10 '18 at 16:25
  • 7
    @Superbrain_bug Thanks for sharing your judgment of this workaround. I am sure that some people might find your judgement interesting. If you find an alternate solution to this issue, I am sure everyone would like to know about it. – Jay Feb 24 '18 at 01:53
  • one of the reasons might be that cassandra is running some memory intensive internal processes like compaction, repair etc and you just don't have enough memory to make a write in 2s - this happens to me very often during development. It works well for 10-15 minutes and then this error so I have to restart it. Very annoying. – walv Nov 12 '20 at 16:02
31

We experienced similar problems on a single node in an ESX cluster with SAN storage attached (which is not recommended by datastax, but we have no other options at this moment).

Note: the settings below can be a big blow to the maximum performance Cassandra can achieve, but we chose a stable system over high performance.

While running iostat -xmt 1 we found high w_await times at the same time the WriteTimeoutExceptions occured. It turned out the memtable could not be written to disk within the default write_request_timeout_in_ms: 2000 setting.

We significantly reduced the memtable size from 512Mb (defaults to 25% of heap space, which was 2Gb in our case) to 32Mb:

# Total permitted memory to use for memtables. Cassandra will stop
# accepting writes when the limit is exceeded until a flush completes,
# and will trigger a flush based on memtable_cleanup_threshold
# If omitted, Cassandra will set both to 1/4 the size of the heap.
# memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 32

We also slightly increated the write timeout to 3 seconds:

write_request_timeout_in_ms: 3000

Also make sure you write regularly to disk if you have high IO wait times:

#commitlog_sync: batch
#commitlog_sync_batch_window_in_ms: 2
#
# the other option is "periodic" where writes may be acked immediately
# and the CommitLog is simply synced every commitlog_sync_period_in_ms
# milliseconds.
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

These settings allowed the memtable to remain small and be written often. The exceptions were solved and we survived the stress tests that were run on the sytem.

dvtoever
  • 3,896
  • 1
  • 28
  • 29
2

It is coordinator (so the server) timing out waiting for acknowledgements for the write.

  • Hi Chris, how can I debug further to find out why the ACK did not come ? I am facing a similar issue and am trying to find the root cause ... Thanks. – opstalj May 12 '16 at 06:11
1

Its worth double checking your GC settings for Cassandra.

In my case I was using a semaphore to throttle async writes and still (sometimes) getting timeouts.

It transpired that I was using unsuitable GC settings, I'd been using cassandra-unit for convenience which had the unintended consequence of running with the default VM settings. Consequently we would eventually trigger hit a stop-the-world GC resulting in a write timeout. Applying the same GC settings as my running cassandra docker image and all is fine.

This might be an uncommon cause but it would have helped me so it seems worth recording here.

Mumrah
  • 392
  • 2
  • 9