0

I have x GB (x varies from 25-40 GB) of daily data which resides in the cassandra and i want to export it in a file. So, I came cross this SO link. Using which you can export the data of query having format to be like this :

select column1, column2 from table where condition = xy

So, I have schedule the same method in the cron job. But due to huge amount of data process get killed while writing to the text file. So, what are other options to export the huge data given the query format.

Community
  • 1
  • 1
Naresh
  • 5,073
  • 12
  • 67
  • 124
  • If the huge amount of data being written to the file is *really* the problem, there is no solution to your problem, as any proposed solution will write the same amount of data. What _precisely_ happens when you try to write the file? – Raedwald Mar 18 '16 at 09:47

2 Answers2

1

Have looked into using Spark to retrieve and process your data? If you are using Datastax you have this as part of your isntallation (DSE Analytics). With Spark you should be able to read the data from your C* instance and write it to a text file without the limitations of a direct CQL statement.

bechbd
  • 6,206
  • 3
  • 28
  • 47
0

Hava a look into the following python script where you can use the scralling using to get the huge data from the cassandra without timeout. query = "SELECT * FROM table_name",statement = SimpleStatement(query, fetch_size=100),results=session.execute(statement),for user_row in session.execute(statement):,for rw in user_row:,This is work for me very efficiently. I didn't mention the cassandra connection i think we can easley get the code for cassandra connection in python.