Python/SQLAlchemy: How to save huge redshift table to CSV?

Question

Im going to run query that returns a huge table (about 700Mb) from Redshift and save it to CSV using SQLAlchemy and python 2.7 on my local machine (mac pro). I've never done this with such a huge queries before and obviously there could be some memory and other issues.

My question is what i shall take into account and how to use sql alchemy in order to make the process work?

Thanks, Alex

Are you trying to alter the data in any way as it is dumped? — brddawg, Oct 12 '17 at 18:02

score 3 · Accepted Answer · answered Oct 12 '17 at 18:11

Ideally, you'd use the UNLOAD command that will move this to S3 storage then take that to your local machine. Here is an example:

UNLOAD ('select * from my_table')
  TO 's3://bucket_name/path/to/my_filename_prefix'
  WITH CREDENTIALS
    'aws_access_key_id=<my_access_key>;
    aws_secret_access_key=<my_secret_key>'
  MANIFEST
  GZIP
  ALLOWOVERWRITE
  ESCAPE
  NULL AS '\\N'

ref http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD_command_examples.html

If a local query is the only or preferred option, here are two things to limit memory usage:

Minimize operations
Loop through results from the query one at a time

Here is an example snippet if edits are done to your data:

result = conn.execute(s)
row = result.fetchone()

with open(outfile, 'ab') as csvfile:
    while row is not None:
        csvfile.write(txt)
        row = result.fetchone()

Otherwise just write all rows:

# dump column titles (optional)
csvfile.writerow(x[0] for x in cursor.description)
# dump rows
csvfile.writerows(cursor.fetchall())

outfile.close()

snippet credit: https://stackoverflow.com/a/2952829/7853322

score 0 · Answer 2 · answered Oct 12 '17 at 17:38

0

If you don't run much else on that machine then memory should not be an issue. Give it a try. Monitor memory use during the execution. Also use "load" to see what pressure on the system is.

answered Oct 12 '17 at 17:38

BuddhaGautama

34
3

Python/SQLAlchemy: How to save huge redshift table to CSV?

2 Answers2