1

I'm finding a difficult time figuring out why a async def in fastapi is blocking the main thread, and why canceling the request doesn't stop the task being executed. I'm including my function, but also any long running function blocks the main thread if the uvicorn workers = 1, this is just one of the examples I have.

Here, uploadPdf blocks the main thread, and while it's running, the server would stop processing new requests until uploadPdf is finished (I can no longer call /firstTest) the below snippet can be tested.

import asyncio
from fastapi import FastAPI, Request, UploadFile, File
import uvicorn
import time
import io
import PyPDF2
from api.decorators import cancel_on_disconnect, threaded
app = FastAPI()

@app.get("/firstTest")
async def hello(request: Request):
    while not await request.is_disconnected():
        print("I am still alive")
        await asyncio.sleep(1)

    print("done")
    return "Hello, world"

#moved to a function for my attempt to threading and parallelism
@threaded
def processPages(pages):
    text = ''
    for i in range(len(pages)):
        pageObj = pages[i]
        text = text + pageObj.extract_text() + ' '
    return text

@app.post('/uploadPdf')
async def uploadPdf(request: Request, file: UploadFile = File(...)):
    print("uploading file")
    bytes = await file.read()
    doc = io.BytesIO(bytes)
    pdfReader = PyPDF2.PdfReader(doc)
    t4 = time.time()
    t = processPages(pdfReader.pages)
    text = t.result_queue.get()
    print(f'it took {time.time() - t4}s to extract text')
    return text

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8080)

What I tried:

1 - using threading.Thread, thread.start(), then thread.join()

2 - using asyncio, wrap the whole method with the @background decorator:

def background(f):
    def wrapped(*args, **kwargs):
        return asyncio.get_event_loop().run_in_executor(None, f, *args, **kwargs)

    return wrapped

3 - and a final @threaded decorator that uses Queue:

def threaded(f, daemon=False):
    import queue
    def wrapped_f(q: queue.Queue, *args, **kwargs):
        '''this function calls the decorated function and puts the 
        result in a queue'''
        ret = f(*args, **kwargs)
        q.put(ret, block=False)
    def wrap(*args, **kwargs):
        '''this is the function returned from the decorator. It fires off
        wrapped_f in a new thread and returns the thread object with
        the result queue attached'''

        q = queue.Queue()

        t = threading.Thread(target=wrapped_f, args=(q,)+args, kwargs=kwargs)
        t.daemon = daemon
        t.start()
        t.result_queue = q        
        return t
    return wrap

Also, when I cancel the request of /uploadPdf, the program doesn't stop until all text has been extracted, then, at the end, when all processing has been done, the uvicorn process crashes because client disconnected

mysticalnetcore
  • 181
  • 1
  • 11
  • Please have a look at related answers [here](https://stackoverflow.com/a/73811351/17865804), as well as [here](https://stackoverflow.com/a/74419367/17865804) and [here](https://stackoverflow.com/a/70657621/17865804) on how to access the `SpooledTemporaryFile` that is behind the `UploadFile` object, in order to pass it to [`PyPDF2.PdfReader()`](https://pypdf2.readthedocs.io/en/latest/modules/PdfReader.html#the-pdfreader-class) instead of reading the file contents and then creating a BytesIO object. – Chris Mar 17 '23 at 17:48

1 Answers1

1

asyncio is a library to run multiple functions concurrently based on an event loop.

The main idea with asyncio is that it does not yield the execution to another async function until it returns or reaches an await statement, so if your long task is not design around asyncio, it will block the only async execution loop.

You have a few options:

  • Do not use async functions, just normal functions that will not block the main thread as they will run in separate threads.
  • Use asyncio based libraries, eg asyncio.sleep vs time.sleep (blocking). Using asyncio is faster that handling each request in separate threads.
  • Use asyncio.to_thread

TL;DR: Do not use async functions if you run blocking tasks.

  • Interestingly if I make uploadPdf as def and not async def, and copy processPages to uploadPdf, both method execute concurrently without any problem – mysticalnetcore Mar 17 '23 at 13:54