0

Currently I have 20 million of records and I want to insert it to my table in Cassandra db. Each record will be around 1KB of size.

Currently what I'm doing is for each record, I make a PreparedStatement (com.datastax.driver.core) and execute it to transfer the data to the table (via com.datastax.driver.core.Sessions).

The whole process takes around 5 to 6 hours to finish. I have 03 nodes for cassandra (using HHDs). Up to my understanding, what I'm doing is serial inserting operation.

My question will be, is there anything I can do to speed up the whole inserting process?

Raedwald
  • 46,613
  • 43
  • 151
  • 237
Xitrum
  • 7,765
  • 26
  • 90
  • 126
  • Possible duplicate of [Cassandra: Load large data fast](http://stackoverflow.com/questions/23530703/cassandra-load-large-data-fast) – Raedwald Jan 22 '16 at 12:20

1 Answers1

2

You are probably using normal statements, wich are great for a few queries but definitly not for your use case, you need to use asynchronous queries to have a proper performance.

I used to load huge datas with the SSTableLoader but I had so much unconsistent datas and same queries returning different results, wich is why I won't recommend it.

Will
  • 2,057
  • 1
  • 22
  • 34
  • Thank you for the advise with SSTableLoader! I will try your suggestion :) – Xitrum Jan 22 '16 at 12:46
  • But is it safe when inserting with asynchronous queries ? Do we have to wait for the result from FutureResultSet to make sure the query is executed successful ? – Xitrum Jan 22 '16 at 14:12
  • You don't need to wait for anything because you are just inserting. Simply log into your cqlsh and do some queries to make sure it went fine – Will Jan 22 '16 at 14:22
  • You mean checking how many records are just inserted x minutes ago ? – Xitrum Jan 22 '16 at 14:42