3

I've got a big table (~500m rows) in mysql RDS and I need to export specific columns from it to csv, to enable import into questDb.

Normally I'd use into outfile but this isn't supported on RDS as there is no access to the file system.

I've tried using workbench to do the export but due to size of the table, I keep getting out-of-memory issues.

Sam Shiles
  • 10,529
  • 9
  • 60
  • 72

3 Answers3

3

Finally figured it out with help from this: Exporting a table from Amazon RDS into a CSV file

This solution works well as long as you have a sequential column of some kind, e.g. an auto incrementing integer PK or a date column. Make sure you have your date column indexed if you have a lot of data!

#!bin/bash

# Maximum number of rows to export/total rows in table, set a bit higher if live data being written
MAX=500000000
# Size of each export batch
STEP=1000000

mkdir -p parts

for (( c=0; c<= $MAX; c = c + $STEP ))
do
   mysql --port 3306 --protocol=TCP -h <rdshostname> -u <username> -p<password> --quick --database=<db> -e "select column1, column2, column3 <table> order by <timestamp> ASC limit $STEP offset $c" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' > export$c.csv

   # split down in to chunks under questdbs 65k line limit
   split -d -l 64999 --additional-suffix=.csv $FILE_NAME.csv ./parts/$FILE_NAME

done

# print out import statements to a file
for i in $(ls -v ./parts); do echo "COPY reading from '$i';" >> import.sql; done;

A slightly different approach which may be faster depending on indexing you have in place is step through the data month by month:

#!bin/bash
START_YEAR=2020
END_YEAR=2022

mkdir -p parts

for (( YEAR=$START_YEAR; YEAR<=$END_YEAR; YEAR++ ))
do
   for (( MONTH=1; MONTH<=12; MONTH++ ))
   do

      NEXT_MONTH=1
      let NEXT_YEAR=$YEAR+1
      if [ $MONTH -lt 12 ]
      then
         let NEXT_MONTH=$MONTH+1
         NEXT_YEAR=$YEAR
      fi

      FILE_NAME="export-$YEAR-$MONTH-to-$NEXT_YEAR-$NEXT_MONTH"

      mysql --port 3306 --protocol=TCP -h <rdshost> -u app -p<password> --quick --database=<database> -e "select <column1>, <column2>, round(UNIX_TIMESTAMP(<dateColumn>)) * 1000000 as date from <table> where <table>.<dateColumn> >= '$YEAR-$MONTH-01 00:00:00' and table.<dateColumn> < '$NEXT_YEAR-$NEXT_MONTH-01 00:00:00' order by <table>.<dateColumn> ASC" | sed 's/\t/","/g;s/^/"/;s/$/"/;s/\n//g' > $FILE_NAME.csv

      # split down in to chunks under questdbs 65k line limit
      split -d -l 64999 --additional-suffix=.csv $FILE_NAME.csv ./parts/$FILE_NAME
   done
done

# print out import statements to a file
for i in $(ls -v ./parts); do echo "COPY reading from '$i';" >> import.sql; done;

The above scripts will output a import.sql containing all the sql statements you need to import your data. See: https://questdb.io/docs/guides/importing-data/

Sam Shiles
  • 10,529
  • 9
  • 60
  • 72
  • Now updated the answer to include splitting the export into chunks smaller than 65k which is a limit imposed by questdb when importing in this manner. – Sam Shiles Oct 24 '22 at 12:18
  • I'm pretty sure `split` can process its standard input, so you can skip the temporary file. – Bill Karwin Oct 24 '22 at 13:13
1

Edit: this solution would work only if exporting the whole table, not when exporting specific columns

You could try using mysqldump with extra params for CSV conversion. AWS documents how to use mysqldump with RDS and you can see at this stackoverflow question how to use extra params to convert into CSV.

I am quoting here the relevant part from that last link (since there are a lot of answers and comments)

mysqldump <DBNAME> <TABLENAME> --fields-terminated-by ',' \
--fields-enclosed-by '"' --fields-escaped-by '\' \
--no-create-info --tab /var/lib/mysql-files/
Javier Ramirez
  • 3,446
  • 24
  • 31
  • As far as I know, you can't specify a subset of columns with mysqldump, given the size of this table, I want to be as frugal as possible in the data I export. – Sam Shiles Oct 24 '22 at 10:43
0

You can use the SELECT ... INTO OUTFILE syntax to export the data to a file on the server. You can then use the mysql command line client to connect to the RDS instance and retrieve the file from the server. The only slight snag is that mysql won't connect to the RDS instance unless the instance is in a VPC, so if it isn't you'll need to connect to a bastion host first, then connect to the RDS instance from there. SELECT * FROM mydb.mytable INTO OUTFILE '/tmp/mytable.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n'; You can then get the file from the server: mysql -uusername -p -hmyrds.rds.amazonaws.com -P3306 When you have a prompt from the mysql command line client you can retrieve the file using the SELECT command: SELECT LOAD_FILE('/tmp/mytable.csv'); You can then pipe the output to a file using: SELECT LOAD_FILE('/tmp/mytable.csv') INTO OUTFILE '/tmp/mytable_out.csv'; You can then use the mysql command line client to connect to your questDB instance and load the data. If you want to retrieve a specific column then you can specify the column name in the SELECT command when creating the file on the RDS server: SELECT column1, column2, column3 FROM mydb.mytable INTO OUTFILE '/tmp/mytable.csv' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\n';

Neel Shah
  • 1
  • 1
  • Error Code: 1227. Access denied; you need (at least one of) the FILE privilege(s) for this operation Pretty sure this is well known limitation of RDS? – Sam Shiles Oct 25 '22 at 12:44