62

I'm new in cassandra, and I have to export the result of a specific query to a csv file.

I found the COPY command, but (from what I understand) it allows you only to copy an already existing table to a csv file, and what I want is to copy directly the stdout of my query to the csv file. is there any way to do it with COPY command or with another way ?

My command is style (select column1, column2 from table where condition = xy) and I'm using cqlsh.

Aaron
  • 55,518
  • 11
  • 116
  • 132
Dady09
  • 649
  • 2
  • 7
  • 10

15 Answers15

73

If you don't mind your data using a pipe ('|') as a delimiter, you can try using the -e flag on cqlsh. The -e flag allows you to send a query to Cassandra from the command prompt, where you could redirect or even perform a grep/awk/whatever on your output.

$ bin/cqlsh -e'SELECT video_id,title FROM stackoverflow.videos' > output.txt
$ cat output.txt

 video_id                             | title
--------------------------------------+---------------------------
 2977b806-df76-4dd7-a57e-11d361e72ce1 |                 Star Wars
 ab696e1f-78c0-45e6-893f-430e88db7f46 | The Witches of Whitewater
 15e6bc0d-6195-4d8b-ad25-771966c780c8 |              Pulp Fiction

(3 rows)

Older versions of cqlsh don't have the -e flag. For older versions of cqlsh, you can put your command into a file, and use the -f flag.

$ echo "SELECT video_id,title FROM stackoverflow.videos;" > select.cql
$ bin/cqlsh -f select.cql > output.txt

From here, doing a cat on output.txt should yield the same rows as above.

Aaron
  • 55,518
  • 11
  • 116
  • 132
  • 1
    Yeh it should resolve my problem, and that what i want to do. exept that it doesn't recognize the -e option. Usage: cqlsh [options] [host [port]] ----- i missed anything ? cqlsh: error: no such option: -e – Dady09 Nov 13 '14 at 15:21
  • 3
    @user3196353 You must be running on an older version. Edit made...try -f with a command file instead. – Aaron Nov 13 '14 at 15:45
  • another quick question - how do i do that when i have where and string parametes: cqlsh -e'SELECT * from dev.photos where uid='score_test4'' > output.txt - does not work – arseny Mar 14 '16 at 15:44
  • 1
    @arseny In that case, you can also use `cqlsh -e` with double-quotes: `cqlsh -e"SELECT * from dev.photos where uid='score_test4';" > output.txt` – Aaron Mar 14 '16 at 15:58
  • 10
    This gives 100 results into the CSV, if we have more results than that which are retrieved by clicking 'MORE' in the interactive mode, how do we go about getting them the CSV? – Jagrati Gogia Mar 29 '18 at 07:27
  • 12
    @JagratiGogia prepend your query with `PAGING OFF;`. E.g. `cqlsh -e "PAGING OFF; SELECT * FROM stackoverflow.videos;"` – Zoltán Dec 19 '18 at 09:25
  • ```cqlsh -e 'select word from inventory.blacklisted_token' Usage: cqlsh.py [options] [host [port]] cqlsh.py: error: 'from' is not a valid port number.``` does not seem to work for me – linehrr Aug 19 '20 at 01:30
  • Can we do formatting of output.txt file? When i open the file,all the values of title column are right aligned and spaces are being appended at the beginning of the value. can we align them left? – Madhuri Nov 09 '20 at 07:09
50
  1. Use CAPTURE command to export the query result to a file.
cqlsh> CAPTURE
cqlsh> CAPTURE '/home/Desktop/user.csv';
cqlsh> select *from user;
Now capturing query output to '/home/Desktop/user.csv'.

Now, view the output of the query in /home/Desktop/user.csv

  1. Use DevCenter and execute a query. Right click on the output and select "Copy All as CSV" to paste the output in CSV.

enter image description here

Arun
  • 1,692
  • 15
  • 24
  • Option 2 is simple, except that DevCenter limits the output rows to 1000, unless you are using 1.6: http://stackoverflow.com/questions/37862375/why-does-devcenter-of-datastax-has-row-restrictions-to-1000 – Cincinnati Joe Mar 27 '17 at 12:47
  • 2
    This (option 1) seems to be a much better option than the accepted answer. – Ken Williams Jun 08 '18 at 15:39
  • 2
    Actually, I think I misunderstand - option 1 doesn't output in CSV format, it just redirects the existing output to a file whose name ends in `.csv`, right? – Ken Williams Jun 08 '18 at 16:24
  • 1
    yes. you can use "COPY TO" to export data from a table into a CSV file. – Arun Jun 08 '18 at 23:48
13

In 2020th you can use DSBulk to export or import data to/from CSV (by default), or JSON. It could be as simple as:

dsbulk unload -k keyspace -t table -u user -p password -url filename

DSBulk is heavily optimized for fast data export, without putting too much load onto the coordinator node that happens when you just run select * from table.

You can control what columns to export, and even provide your own query, etc. See following blog posts for examples:

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
11

I just wrote a tool to export CQL query to CSV and JSON format. Give it a try :)

https://github.com/tenmax/cqlkit

popcorny
  • 1,710
  • 16
  • 16
8

You can use the COPY command to create the CSV file. e.g. copy table with selected columns. Columns are optional, if you select them, every column will be picked.

COPY TABLE (COL1, COL2) TO 'filename.csv' HEADER=TRUE/FALSE

For more reference https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html

Akhila
  • 3,235
  • 1
  • 14
  • 30
Arvind
  • 99
  • 1
  • 1
7

With bash:

If you need to query the data (not possible with COPY TO) and if you need the final product to be importable (ie with COPY FROM):

cqlsh -e "SELECT * FROM bar WHERE column = 'baz' > raw_output.txt

Then you can reformat the output with sed

sed 's/\ //g; /^----.*/d; /^(/d; /^\s*$/d;' raw_output.txt | tee clean_output.csv

Which pretty much says

sed 'remove spaces; remove the column boarder; remove lines beginning with (COUNT X); and remove blank lines' | write output into clean_output.csv

The sed regexp's could be cleaned up to better suite your specific case, but thats the general idea.

Jake
  • 251
  • 4
  • 4
6

Cannot comment... To deal with "MORE" issue when there are more than 100 rows, simply add "paging off" before the SQL.

Something like

$ bin/cqlsh -e'PAGING OFF;SELECT video_id,title FROM stackoverflow.videos' > output.txt

This will cause a little messy at the beginning of the output file but can easily be removed afterwards.

luckyee
  • 88
  • 2
  • 6
5

I believe DevCenter also allows you to copy to CSV. http://www.datastax.com/what-we-offer/products-services/devcenter

phact
  • 7,305
  • 23
  • 27
4

In windows, double quotes should be used to enclose the CQL.

cqlsh -e"SELECT video_id,title FROM stackoverflow.videos" > output.txt

Karthik Sankar
  • 817
  • 8
  • 11
3

If I am understanding correctly you want to redirect your output to stdout?

Put your cql command in a file. My files is called select.cql and contents are:

select id from wiki.solr limit 100;

Then issue the following and you get it to stdout:

cqlsh < select.cql

I hope this helps. From there on you can pipe it and add commas, remove headers etc.

Bereng
  • 724
  • 4
  • 5
  • 1
    Thank you for the response, but it's not what i really need, i'm searching for redirecting the output of my sql query to a csv file (stdout > file.csv) so i can use it in other jobs – Dady09 Nov 13 '14 at 14:31
  • 1
    Yep, that's what that does. The result of the query stored in the cql file is outputted to stdout. Sometimes, for complex or long queries, you store them in a file rather than write them in the cmd line. Then you can 'send' the file with the query to cql and you get the output to stdout. For one-liners and simple queries there is the other method as well. – Bereng Nov 14 '14 at 09:31
  • 1
    in hindsight, you're right, i must have misunderstood you the first time, the other answer seemed a little more clear to me, sorry – Dady09 Nov 18 '14 at 16:08
2

As the other guys have suggested, export the standard query output using ./cqlsh -e 'SELECT ...' > data.csv.

Once you have this you can easiliy replace the pipes ( | ) with commas using Excel (if you have it installed).

  1. First open your file in a text editor (vi/notepad++) and delete the separator that Cass puts in (-----+-------+---), as well as the info on the number of rows from the bottom.
  2. Open a new Excel workbook.
  3. Click on the Data tab.
  4. Click on "From Text/CSV" (top left).
  5. Select your file, specifiy the pipe symbol as a delimiter, click Load.
  6. That will create an .xlsx file so you'll have to Save As .csv manually. That will strip Excel's formatting and leave you with commas.
1

Here, when using Cassandra on a docker container with SSL and authentication:

docker exec -it <cassandra_container> cqlsh -u <user> -p <password> --ssl -e  "select column1, column2 from table where condition = xy" > output.txt
MAvim
  • 55
  • 5
0

CQL COPY is good option for importing or exporting data. But if you want to analyze some small query output you can run below command and save the output in a file.

cqlsh -e "SELECT * FROM table WHERE column = 'xyz' > queryoutput.txt

However, you can use CAPTURE also for saving output of the query to analyze something

LetsNoSQL
  • 1,478
  • 1
  • 11
  • 23
0

Follow the below steps to selectively export & import the Cassandra data.

Exporting:

  • Write all the select queries in a file named dump.cql like below

    paging off;

    select * from student where id=10;

    select * from student where id=15;

Note: Paging off is mandatory above the queries to avoid limiting the query results to default 100 records

  • Creating a dump

cqlsh -u user_name -p 'password' ip_address -k keyspace_name -f dump.cql > dump.csv;

(for remote machine)

or

cqlsh -k keyspace_name -f dump.cql > dump.csv;

(for local machine)

  • Removing whitespace characters from dump(It avoids removing whitespace withing json data)

sed -r 's/(\".*\")|\s*/\1/g' dump.csv > data_without_spaces.csv

Importing:

cqlsh -e "copy keyspace_name.table_name from 'data_without_spaces.csv' with delimiter = '|';"

Chirram Kumar
  • 346
  • 5
  • 8
  • it's bad idea to use `cqlsh` for that - there are specialized tools, like DSBulk ot do that – Alex Ott Sep 25 '20 at 09:13
  • The question has asked about taking dump of a specific query, I don't think DSBulk will support that. – Chirram Kumar Sep 28 '20 at 13:27
  • 1
    it perfectly supports that... via `-query` option, although for that query it's not required - it's just need to export 2 specific columns. And given query is really scanning of the whole table that isn't very well supported in the `cqlsh` if you have a lot of data – Alex Ott Sep 28 '20 at 14:17
-1

The person asking asked for CSV not text.

I did this hack get my results. It worked for me and I moved on with my day.

me:~/MOOSE2# echo "USE ████it; select * from samples_daily_buffer where dog_id=██48;" | cqlsh --cqlversion="3.4.4" cassandra0.stage.███████ | sed -e "s/ | */,/g" | sed -e "s/^ *//g" | tail -n +4 > ./myfile.csv