I love FastAPI. However, I'm not well-versed in the nuances of the "guts" behind its magic (uvicorn, watchgod, etc.)
I'm developing an API that will let users interact with a HiggingFace transformer model. Specifically, I'm using a RobertaForQuestionAnswering model.
However, I'm running into a few problems that I'm unable to debug with the VSCode debugger. Trying to "step into" the problem doesn't yield anything at all.
Problems:
- Even when I declare the
RobertaForQuestionAnswering
model at the top of my routes file like so:
tokenizer = RobertaTokenizer.from_pretrained(data_dir)
model = RobertaForQuestionAnswering.from_pretrained(data_dir)
transformer_pipeline = pipeline(task="question-answering", model=model, tokenizer=tokenizer)
@router.post("/test")
async def test():
return response_model(question="What is my name?", context="My name is Joe")
FastApi starts a new subprocess EVERY TIME the model is mentioned, and re-creates it EVERY TIME, which is very time consuming. I have tried to use FastAPI's dependency injection and a few other methods, but I think these efforts are futile since FastAPI creates a new ephemeral subprocess for every invocation of the transformer model.
- When running the FastAPI app with uvicorn, and using a single worker (the default), the API "shuts down" gracefully whenever the transformer model is run. So, I need to run the app with
workers=2
like so:
uvicorn.run("project_name.api.webapp:init_fastapi_application",host=host,port=port, log_level='debug', workers=2)
Where init_fastapi_application
is just a custom function that returns the app object. Running with multiple workers is a total "hacky" fix to the "graceful shut down" problem I'm having, and I frustratingly can't figure out why it fixes the problem!
TLDR;
How/why/when does FastApi create new subprocesses, and why are they being created every time I invoke a Hugging Face Transformer Model?
Why does running FastApi without multiple workers fail when trying to interact with the transformer model?
Tried:
- Dependency injection with FastApi "Depends"
- "Await"-ing the long transformer model creation
- Using FastAPI "@asynccontextmanager"
- Using "@app.on_event("startup")" to load the transformer model before app startup
- Creating a singleton TransformerObject class to force FastApi to share the same object instance (which is futile because FastAPI creates a new subprocess every time)
EDIT: I was able to access my state like so:
webapp.py:
def init_fastapi_application():
@asynccontextmanager
async def lifespan(app: FastAPI):
''' Run at startup
Initialise the Client and add it to request.state
'''
transformer_pipeline = TransformerQA.get_instance().model
yield {'transformer_pipeline': transformer_pipeline}
''' Run on shutdown
Close the connection
Clear variables and release the resources
'''
print("in asynccontextmanager shutdown block")
app = FastAPI(lifespan=lifespan)
app.include_router(api_routers.routes.router)
return app
routes.py:
@router.post("/test2")
async def test2(request: Request):
model = request.state._state['transformer_pipeline']
return model("What is my name?","My name is Joe")
But whenever the model is run (with a single worker),
print("in asynccontextmanager shutdown block")
Gets run, indicating the app gets shut down. Here is the full console output:
######### Start the app
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8080 (Press CTRL+C to quit)
######### Can hit endpoints that don't involve my
######### model, and they work just fine here.
...
######### Hit the /test2 endpoint (returns the payload to the client still which is interesting)
INFO: Shutting down
INFO: Waiting for connections to close. (CTRL+C to force quit)
INFO: 127.0.0.1:39138 - "POST /test2 HTTP/1.1" 200 OK
INFO: Waiting for application shutdown.
in asynccontextmanager shutdown block
INFO: Application shutdown complete.
INFO: Finished server process [126197]