I'm working on a Spring project that needs exporting Redshift table data into local a single CSV file. The current approach is to:
- Execute Redshift UNLOAD to write data across multiple files to S3 via JDBC
- Download said files from S3 to local
- Joining them together into one single CSV file
UNLOAD (
'SELECT DISTINCT #{#TYPE_ID}
FROM target_audience
WHERE #{#TYPE_ID} is not null
AND #{#TYPE_ID} != \'\'
GROUP BY #{#TYPE_ID}'
)
TO '#{#s3basepath}#{#s3jobpath}target_audience#{#unique}_'
credentials 'aws_access_key_id=#{#accesskey};aws_secret_access_key=#{#secretkey}'
DELIMITER AS ',' ESCAPE GZIP ;
The above approach has been fine and all. But i think the overall performance can be improved by, for example skipping the S3 part and get data directly from Redshift to local.
After searching through online resources, i found that you can export data from redshift directly through psql or to perform SELECT queries and move the result data myself. But neither option can top Redshift UNLOAD performance with parallel writing.
So is there any way i can mimic UNLOAD parallel writing to achieve the same performance without having to go through S3 ?