I have been exploring Polars for my web application. Its been impressive so far, until I hit this issue that has stalled my use of this awesome library. Usecase: I read a parquet file into Polars dataframe, use this pl dataframe to serve results for a get request on FastAPI.
@fastApi.get("/polars-test")
async def polars_test():
polars_df = pl.read_parquet(f"/data/all_area_keys.parquet")
df = polars_df.limit(3)
return df.to_dicts()
polars= 0.16.2
pyarrow=9.0.0
fastapi=0.92.0
BaseDockerImage = tiangolo/uvicorn-gunicorn-fastapi:python3.11
When I package it up into docker image and run the FastAPI app on gunicorn, this get path does not respond. Using the /docs, hitting this end point will just wait for several minutes and the worker terminates, without any errors logged
I am starting to think Polars multithread is not playing well with FastAPI'S concurrency. But I an unable to find related documents to get an understanding. Please help, would absolutely hate to abandon Polars.
Troubleshooting done so far:
- The get request works perfectly when I test it locally.
- log on to the running docker container and run the above pl commands - it works
- Just tried to print the schema of the dataframe - it works. So the dataframe is created and metadata available. I get this issue only when I run filter or Any transform on the polars dataframe
- Created a lazy frame and tried to collect, but no luck
- Remove async from the method, no luck
- Changed python version from 3.8 to 3.11, no luck
- Spcifying the platform to linus/amd64 while running the docker, no luck