First of all, CPU isn't 20%. While the CPU system is at 20%, the user CPU is at around 70%. Here is the explanation between user CPU and system CPU: User CPU time vs System CPU time?
Secondly, iostat invoked without arguments isn't the best way of looking at disc usage. From: Basic I/O Monitoring on Linux
Without a specified interval, iostat displays statistics since the
system was up then exits, which is not useful in our case.
For more comprehensive look at the system, use
dstat -rcdgilmnps 60

Now we see clearly the averages from the last minute. CPU idle is at 1-4%, we have ~340 ios with 15M write speed.
Next usefull tool is nodetool cfstats:

Where we can see some stats for particular table. Write latency statistics are particularly interesting and equals to 1.5ms.
Finally, executing trace for write:
id: 12345 -> host NodeAsked:9042, achieved consistency: LocalOne
Sending MUTATION message to /NodeA on NodeAsked[MessagingService-Outgoing-/NodeA] at 0
Sending MUTATION message to /NodeB on NodeAsked[MessagingService-Outgoing-/NodeB] at 0
REQUEST_RESPONSE message received from /NodeA on NodeAsked[MessagingService-Incoming-/NodeA] at 0
Processing response from /NodeA on NodeAsked[SharedPool-Worker-32] at 0
MUTATION message received from /NodeAsked on NodeA[MessagingService-Incoming-/NodeAsked] at 12
Determining replicas for mutation on NodeAsked[SharedPool-Worker-45] at 114
Appending to commitlog on NodeAsked[SharedPool-Worker-45] at 183
Adding to mytable memtable on NodeAsked[SharedPool-Worker-45] at 241
Appending to commitlog on NodeA[SharedPool-Worker-5] at 5360
Adding to mytable memtable on NodeA[SharedPool-Worker-5] at 5437
Enqueuing response to /NodeAsked on NodeA[SharedPool-Worker-5] at 5527
Sending REQUEST_RESPONSE message to /NodeAsked on NodeA[MessagingService-Outgoing-/NodeAsked] at 5739
Shows that what's limiting us is the storage speed. It's the best to execute several spontaneous writes with enabled tracing on normal write load to see some patterns.
Vote up if you agree.