I am trying not to use the copy command because it's the database is quite big. I am using talend etl open source tool , but haven't found any solution yet.
2 Answers
There are 3 most common options for data migration to Cassandra:
1. Custom Spark job. That demands some programming, though it would be the most scalable solution and allows you to have any custom data transformation logic. You may need data transformation as I can't imagine you'll have exactly the same table structure for RDBMS and key-value storage.
2. Using sqoop from the DataStax Enterprise package (it includes a custom driver for Cassandra).
3. Using sqoop and Casasndra's JDBC drivers. Although, I have no ideas about the last Cassandra's JDBC driver version features and stability. We had some issues with the earlier ones.
Ok, there is the 4th one. You can write your own data simple stand-alone migration tool (using Java, for instance). This tool will read the data from Postgres row-by-row and call Cassandra's inserts. That would be extremely slow though rather simple.
You've mentioned that database is quite big. But that just means that you had to wait longer until the migration ends. This is not critical for many cases, really.
CPU works while you can do some other things. Otherwise, you had to use your own time while the CPU relaxes.

- 800
- 4
- 8
-
Thanku very much!! :) – Annie Aug 02 '17 at 08:34
-
@S. Stas Is there any way to achieve the above like for example convert Postgres dump file into Cassandra dump file or something on that line. – Ishwar Chincholkar Aug 11 '17 at 07:02
-
Well, technically you can run pg_dump and get a list of sql commands. Then you may need to replace some sql commands using text editor (to cosider Cassandra's specific). The bad thing is as @Annie wrote, the database is quite big. So the .sql file would be even bigger. So COPY command would be the more realistic way of input here. – S. Stas Aug 11 '17 at 09:17
If you prefer Apache Spark, you can use Spark Cassandra Connector to save DataFrames to Cassandra.
See this question on how to connect Postgres using PySpark.

- 2,294
- 1
- 15
- 26