Are BytesIO() objects loaded fully into memory or streamed?

Question

Context

I have a somewhat large (about 30 GBs) table that I would like to move from Postgres to S3. I am trying to wrap my head around how file-like io.BytesIO() works, and how much memory do I need to provision on the machine to be able to design my code better.

What I have tried to

I am using aiobotcore to move data to s3 and asyncpg to query Postgres. I have built a sample demonstrate to explain question better.

import asyncio
import asyncpg
import io
import aiobotocore 


async def happy_demo():
    # Define the connection to Postgres
    con = await asyncpg.connect(**postgres_credentials)
    # Get a handle onto an aiobotocore session
    session = aiobotocore.get_session()
    # Create file-like object
    file = io.BytesIO()
    # Create the S3 client
    async with session.create_client(**aws_credentials) as client:
        # Create a gzip file
        with gzip.GzipFile(fileobj=file, mode='wb') as gz:
            await con.copy_from_query(query='select * from bar', output=gz, format='csv')
            # Write to S3
            await client.put_object(Bucket="happy_bucket", Key="foo.csv", Body=file.getvalue())
            # Close the connection.
            await con.close()


def main():
    loop = asyncio.get_event_loop()
    loop.run_until_complete(happy_demo())


if __name__ == '__main__':
    main()

What I am trying to understand

Particularly, I am trying to understand, if the table size is around 30gb, do I need to expect to have a machine with at least 30GB of ram to be able to do this operation?

Update

I have updated my code (to include gzipped file) and slightly reformatted my question to include gzip

Your code extracts all the data from the table in one single expression. The BytesIO object will require 30GB of RAM. Why do you expect *async* operation to magically chunk anything? — Serge Ballesta, Jan 30 '21 at 17:47
Hi @SergeBallesta : ). I did not assume the async operation is relevant (hence why I did not add an async tag), sorry if that was confusing in my question. I just wanted to confirm if ("The BytesIO object will require 30GB of RAM") or if there is any streaming (the data gets uploaded gradually from BytesIO to S3). — alt-f4, Jan 30 '21 at 18:00
I think you're better off using an actual file (so you can use ROM for storage) rather than a 30gb all-in-memory `BytesIO` object. In terms of RAM usage, actually sending the file to AWS shouldn't take that much, since only a few kilobytes of the file will be read for each packet sent, and the entire file won't need to be loaded at once. If you _have_ 30gb of RAM available then this would probably be a performance hit, but otherwise the amount of swap your computer would require to accommodate 30gb of data would probably outweigh that. — Green Cloak Guy, Jan 30 '21 at 18:06
Hi @GreenCloakGuy. I am happy to accept that as an answer if you post it : ) I have also updated my question and code to use gzip. I am slightly confused if using Gzip will actually help me need less RAM — alt-f4, Jan 31 '21 at 10:26
I am moving the question regarding Gzip to a separate question as it was not part of the official requirement [here](https://stackoverflow.com/questions/65970965/are-bytesio-objects-loaded-fully-into-memory-or-streamed) — alt-f4, Jan 31 '21 at 18:19

Are BytesIO() objects loaded fully into memory or streamed?

Context

What I have tried to

What I am trying to understand

Update

0 Answers0