I have a Java list of 100,000 names that I'd like to ingest into a 3 node Cassandra cluster that is running Datastax Enterprise 5.1 with Cassandra 3.10.0
My code ingests but it takes a looooong time. I ran a stress test on the cluster and was able to do over 25,000 writes per second. With my ingest code I am getting a terrible performace of around 200/second.
My Java List has 100,000 names in it and is called myList. I use the following prepared statement and session execution to ingest the data.
PreparedStatement prepared = session.prepare("insert into names (id, name) values (?, ?)");
int id = 0;
for(int i = 0; i < myList.size(); i++) {
id += 1;
session.execute(prepared.bind(id, myList.get(i)));
}
I added a cluster monitor to my code to see what was going on. Here is my monitoring code.
/// Monitoring Status of Cluster
final LoadBalancingPolicy loadBalancingPolicy =
cluster.getConfiguration().getPolicies().getLoadBalancingPolicy();
ScheduledExecutorService scheduled =
Executors.newScheduledThreadPool(1);
scheduled.scheduleAtFixedRate(() -> {
Session.State state = session.getState();
state.getConnectedHosts().forEach((host) -> {
HostDistance distance = loadBalancingPolicy.distance(host);
int connections = state.getOpenConnections(host);
int inFlightQueries = state.getInFlightQueries(host);
System.out.printf("%s connections=%d, current load=%d, maxload=%d%n",
host, connections, inFlightQueries,
connections *
poolingOptions.getMaxRequestsPerConnection(distance));
});
}, 5, 5, TimeUnit.SECONDS);
The monitoring 5 second output shows the following for 3 iterations:
/192.168.20.25:9042 connections=1, current load=1, maxload=32768
/192.168.20.26:9042 connections=1, current load=0, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768
/192.168.20.25:9042 connections=1, current load=1, maxload=32768
/192.168.20.26:9042 connections=1, current load=0, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768
/192.168.20.25:9042 connections=1, current load=0, maxload=32768
/192.168.20.26:9042 connections=1, current load=1, maxload=32768
/192.168.20.34:9042 connections=1, current load=0, maxload=32768
It doesn't appear that I am very effectively utilizing my cluster. I'm not sure what I am doing wrong and would greatly appreciate any tips.
Thank you!