81

I have a MySQL database running in Amazon RDS, and I want to know how to export an entire table to CSV format.

I currently use MySQL server on Windows to query the Amazon database, but when I try to run an export I get an error, probably because there's no dedicated file server for amazon RDS. Is there a solution to this?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Kenny
  • 1,131
  • 1
  • 9
  • 12
  • 2
    `mysql`'s `--batch` flag outputs tab-separated fields, with the contents escaped. This is a common format and should not require translation to .csv, which when done with `sed` could potentially result in mangled data in corner cases. – Eric Walker Jun 20 '14 at 22:07
  • Note that if mysql output is redirected, it will automatically generate a TSV output, while on screen is printed in tabular format. So `mysql db -e 'query' > out.tsv` will output a TSV output while the same command without redirection will print a tabular output. – Marco Marsala Jan 09 '23 at 08:42

7 Answers7

133

Presumably, you are trying to export from an Amazon RDS database via a SELECT ... INTO OUTFILE query, which yields this indeed commonly encountered issue, see e.g. export database to CSV. The respective AWS team response confirms your assumption of lacking server access preventing an export like so, and suggests an alternative approach as well via exporting your data in CSV format by selecting the data in the MySQL command line client and piping the output to reformat the data as CSV, like so:

mysql -u username -p --database=dbname --host=rdshostname --port=rdsport --batch
  -e "select * from yourtable"
  | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' > yourlocalfilename

User fpalero provides an alternative and supposedly simpler approach, if you know and specify the fields upfront:

mysql -uroot -ppassword --database=dbtest
  -e "select concat(field1,',',field2,',',field3) FROM tabletest" > tabletest.csv
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Steffen Opel
  • 63,899
  • 11
  • 192
  • 211
  • 1
    getting an error `ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2) ` – t q Jun 30 '13 at 21:48
  • 2
    The issue here is if you have an enormous amount of data to dump, you will get an out of memory error. – Halfgaar Aug 06 '15 at 12:26
  • 3
    Actually, I just noticed `mysql` has a `--quick` option too, making it flush to disk immediately. – Halfgaar Aug 06 '15 at 13:05
  • 1
    What would be the correct LOAD DATA LOCAL INFILE to insert the CSV back in SQL? I tried the following with no luck. `LOAD DATA LOCAL INFILE dumpFile.sql REPLACE INTO TABLE schema.table_name IGNORE 1 LINES;` I think I need to add the "delimited by" and all the other stuff, but I haven't had any luck so far. – hectorg87 Nov 10 '15 at 11:03
  • 2
    `gsed 's/"/""/g;s/^/"/;s/$/"/;s/\t/","/g'`, which is more formal regarding csv specification? – Judking Nov 24 '15 at 12:51
  • 2
    You might want to add '--skip-column-names' to your params. Thanks! – Eugene Kulabuhov Jul 04 '16 at 13:13
  • 1
    fpalero answer won't work for NULL cases. `concat(anything, NULL) is NULL` – Siva Praveen Mar 08 '19 at 08:33
  • 1
    I am not sure why this got so many upvotes. This is an awful solution. – Daniel Viglione Apr 19 '20 at 14:13
  • Please enote --batch is implied when output is redirected. So `mysql db -e 'query' > out.tsv` will output a TSV output while the same command without redirection will print a tabular output. – Marco Marsala Jan 09 '23 at 08:44
6

First of all, Steffen's answer works in most cases.

I recently encountered some larger and more complex outputs where "sed" was not enough and decided to come up with a simple utility to do exactly that.

I build a module called sql2csv that can parse the output of the MySQL CLI:

$ mysql my_db -e "SELECT * FROM some_mysql_table" 

+----+----------+-------------+---------------------+
| id | some_int | some_str    | some_date           |
+----+----------+-------------+---------------------+
|  1 |       12 | hello world | 2018-12-01 12:23:12 |
|  2 |       15 | hello       | 2018-12-05 12:18:12 |
|  3 |       18 | world       | 2018-12-08 12:17:12 |
+----+----------+-------------+---------------------+

$ mysql my_db -e "SELECT * FROM some_mysql_table" | sql2csv
 
id,some_int,some_str,some_date
1,12,hello world,2018-12-01 12:23:12
2,15,hello,2018-12-05 12:18:12
3,18,world,2018-12-08 12:17:12

You can also use the built in CLI:

sql2csv -u root -p "secret" -d my_db --query "SELECT * FROM some_mysql_table;"

1,12,hello world,2018-12-01 12:23:12
2,15,hello,2018-12-05 12:18:12
3,18,world,2018-12-08 12:17:12

More information in on sql2csv (GitHub).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Gab
  • 3,404
  • 1
  • 11
  • 22
5

Assuming MySQL in RDS, an alternative is to use batch mode which outputs TAB-separated values and escapes newlines, tabs and other special characters. I haven't yet struck a CSV import tool that can't handle TAB-separated data. So for example:

$ mysql -h myhost.rds.amazonaws.com -u user -D my_database -p --batch --quick -e "SELECT * FROM my_table" > output.csv

As noted by Halfgaar, the --quick option flushes immediately, so it avoids out-of-memory errors for large tables. To quote strings (recommended), you'll need to do a bit of extra work in your query:

SELECT id, CONCAT('"', REPLACE(text_column, '"', '""'), '"'), float_column
  FROM my_table

The REPLACE escapes any double-quote characters in the text_column values. I would also suggest using iso8601 strings for datetime fields, so:

SELECT CONCAT('"', DATE_FORMAT(datetime_column, '%Y%m%dT%T'), '"') FROM my_table

Be aware that CONCAT returns NULL if you have a NULL column value.

I've run this on some fairly large tables with reasonable performance. 600M rows and 23 GB data took ~30 minutes when running the MySQL command in the same VPC as the RDS instance.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
AndyB
  • 131
  • 2
  • 3
3

There is a new way from AWS how to do it. Just use their DMS (database migration service).

Here is documentation on how to export table(s) to files on S3 storage: Using Amazon S3 as a target for AWS Database Migration Service - AWS Database Migration Service

You will have possibility to export in two formats: CSV or Parquet.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Vladimir Gilevich
  • 861
  • 1
  • 10
  • 17
1

I'm using the Yii framework on EC2 connecting to an RDS MySQL. The key is to use fputcsv(). The following works perfectly, both on my localhost as well as in production.

$file = 'path/to/filename.csv';
$export_csv = "SELECT * FROM table";

$qry = Yii::app()->db->createCommand($export_csv)->queryAll();

$fh = fopen($file, "w+");
foreach ($qry as $row) {
    fputcsv($fh, $row, ',' , '"');
}
fclose($fh);
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
1

If you use Steffen Opel's solution, you'll notice that it generates a header that includes the 'concat' string literal. Obviously this is not what you want. Most likely you will want the corresponding headers of your data.

This query will work without any modifications, other than substituting column names and table names:

mysql -h xxx.xxx.us-east-2.rds.amazonaws.com 
--database=mydb -u admin -p 
-e "SELECT 'column1','column2' 
UNION ALL SELECT column1,column2 
FROM table_name WHERE condition = value" > dataset.csv

I just opened the results in the Numbers OS X app and the output looks perfect.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Daniel Viglione
  • 8,014
  • 9
  • 67
  • 101
0

With a very large table (~500m rows), even with --quick, nothing was being written to my export file and the process never finished (+6 hours). I wrote the following bash script to get around this. Another bonus is you have an indication of progress as each batch file gets written.

This solution works well as long as you have a sequential column of some kind, e.g. an auto incrementing integer PK or a date column. Make sure you have your date column indexed if you have a lot of data!

#!bin/bash

# Maximum number of rows to export/total rows in table, set a bit higher if live data being written
MAX=500000000
# Size of each export batch
STEP=1000000

for (( c=0; c<= $MAX; c = c + $STEP ))
do
   mysql --port 3306 --protocol=TCP -h <rdshostname> -u <username> -p<password> --quick --database=<db> -e "select column1, column2, column3 <table> order by <timestamp> ASC limit $STEP offset $c" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' > export$c.csv
done

A slight different approach which may be faster depending on indexing you have in place is step through the data month by month:

#!bin/bash
START_YEAR=2000
END_YEAR=2022
for (( YEAR=$START_YEAR; YEAR<=$END_YEAR; YEAR++ ))
do
   for (( MONTH=1; MONTH<=12; MONTH++ ))
   do

      NEXT_MONTH=1
      let NEXT_YEAR=$YEAR+1
      if [ $MONTH -lt 12 ]
      then
         let NEXT_MONTH=$MONTH+1
         NEXT_YEAR=$YEAR
      fi
            
      mysql --port 3306 --protocol=TCP -h <rdshostname> -u app -p<password> --quick --database=<database> -e "select column1, column2, column3 from <table> where <dateColumn> >= '$YEAR-$MONTH-01 00:00:00' and <dateColumn> < '$NEXT_YEAR-$NEXT_MONTH-01 00:00:00' order by <dateColumn> ASC" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' > export-$YEAR-$MONTH-to-$NEXT_YEAR-$NEXT_MONTH.csv
   done
done

Hopefully this helps someone

Sam Shiles
  • 10,529
  • 9
  • 60
  • 72