Context
I have a somewhat large (about 30 GBs) table that I would like to move from Postgres to S3. I am trying to wrap my head around how file-like io.BytesIO()
works, and how much memory do I need to provision on the machine to be able to design my code better.
What I have tried to
I am using aiobotcore to move data to s3 and asyncpg to query Postgres. I have built a sample demonstrate to explain question better.
import asyncio
import asyncpg
import io
import aiobotocore
async def happy_demo():
# Define the connection to Postgres
con = await asyncpg.connect(**postgres_credentials)
# Get a handle onto an aiobotocore session
session = aiobotocore.get_session()
# Create file-like object
file = io.BytesIO()
# Create the S3 client
async with session.create_client(**aws_credentials) as client:
# Create a gzip file
with gzip.GzipFile(fileobj=file, mode='wb') as gz:
await con.copy_from_query(query='select * from bar', output=gz, format='csv')
# Write to S3
await client.put_object(Bucket="happy_bucket", Key="foo.csv", Body=file.getvalue())
# Close the connection.
await con.close()
def main():
loop = asyncio.get_event_loop()
loop.run_until_complete(happy_demo())
if __name__ == '__main__':
main()
What I am trying to understand
Particularly, I am trying to understand, if the table size is around 30gb, do I need to expect to have a machine with at least 30GB of ram to be able to do this operation?
Update
I have updated my code (to include gzipped file) and slightly reformatted my question to include gzip