3

I'm building a FastAPI endpoint where web client user can essentially download files which are stored in MongoDB as GridFS chunks. However, FastAPI's StreamingResponse doesn't take the supposedly file-like AsyncIOMotorGridOut object returned by motor's open_download_stream method.

I already have an endpoint which can take files in a form and cause them to be uploaded to MongoDB. I would expect a similar download helper function to be as simple as this:

async def upload_file(db, file: UploadFile):
    """ Uploads file to MongoDB GridFS file system and returns ID to be stored with collection document """
    fs = AsyncIOMotorGridFSBucket(db)
    file_id = await fs.upload_from_stream(
        file.filename,
        file.file,
        # chunk_size_bytes=255*1024*1024, #default 255kB
        metadata={"contentType": file.content_type})
    return file_id

My first attempt is to use a helper like this:

async def download_file(db, file_id):
    """Returns  AsyncIOMotorGridOut (non-iterable file-like object)"""
    fs = AsyncIOMotorGridFSBucket(db)
    stream = await fs.open_download_stream(file_id)
    # return download_streamer(stream)
    return stream

My FastAPI endpoint looks like this:

app.get("/file/{file_id}")
async def get_file(file_id):
    file = await download_file(db, file_id)
    return StreamingResponse(file, media_type=file.content_type)

When trying to download a file with a valid file_id, I get this error: TypeError: 'AsyncIOMotorGridOut' object is not an iterator

My 2nd attempt has been to make a generator to iterate over chunks of the file:

async def download_streamer(file: AsyncIOMotorGridOut):
    """ Returns generator file-like object to be served by StreamingResponse
    https://fastapi.tiangolo.com/advanced/custom-response/#streamingresponse
    """
    chunk_size = 255*1024*1024
    for chunk in await file.readchunk():
        print(f"chunk: {chunk}")
        yield chunk

I then use the commented return download_streamer(stream) in my download_file helper, but for some reason, every chunk is just an integer of 255.

What's the best way to get a file out of MongoDB using motor and streaming it as a FastAPI web response without using a temporary file? (I don't have access to hard drive, and I don't want to store the whole file in memory - I just want to stream files from MongoDB through FastAPI directly to client a chunk at a time).

hamx0r
  • 4,081
  • 1
  • 33
  • 46

1 Answers1

2

My solution is to create a generator which happens to be in Python 3.6+ syntax per this SO answer. Such an iterator works with the async variant of FastAPI's StreamingResponse, and reads one GridFS chunk at a time (defaults to 255KB per motor docs) using readchunk() method. This chunk size is set when file is stored in MongoDB using upload_from_stream(). An optional implementation would be to use .read(n) to read n bytes at a time. I chose to use readchunk() so exactly 1 DB document is being fetched at a time during the stream (each GridFS file is broken up into chunks and stored one chunk at a time in the DB)

async def chunk_generator(grid_out):
    while True:
        # chunk = await grid_out.read(1024)
        chunk = await grid_out.readchunk()
        if not chunk:
            break
        yield chunk


async def download_file(db, file_id):
    """Returns iterator over AsyncIOMotorGridOut object"""
    fs = AsyncIOMotorGridFSBucket(db)
    grid_out = await fs.open_download_stream(file_id)
    return chunk_generator(grid_out)

A future improvement will be to have download_file() return a tuple so as to include not only the generator, but metadata like ContentType.

hamx0r
  • 4,081
  • 1
  • 33
  • 46