I'm developing python api server with FastAPI and Uvicorn. But I changed Uvicorn to Gunicorn, I found some functions going very slow. This seems to happen when dealing with large DataFrames.
For examples, dummy.csv
has "date" string and almost 88,000 rows. And I want to change string format YYYY-MM-DD
to YYYY-MM
. So I used pandas.apply()
and strftime()
for each rows. When I use Uvicorn, that's not cost long time. But with Gunicorn, take almost 10 times longer...
I know about pandas apply() is slow. But I don't know why there is a difference in time spent between the Gunicorn and Uvicorn.
Here's a specific example.
My code
import time
import pandas as pd
from fastapi import FastAPI
app = FastAPI()
@app.get("/pd")
def panda():
df = pd.read_csv("./dummy.csv")
df["date"] = pd.to_datetime(df.date)
print(df)
start = time.time()
df["date"] = df["date"].apply(lambda x: x.strftime("%Y-%m"))
end = time.time()
print("[TIME]", end - start)
print(df)
Command
uvicorn main:app
gunicorn -k uvicorn.workers.UvicornWorker main:app --bind 0.0.0.0:8000
Packages
python==3.10.6
fastapi==0.95
pandas==1.4.2
uvicorn==0.19.0
uvloop==0.17.0
gunicorn==20.1.0
...
Result
Uvicorn
takes 0.23 sec
date
0 2010-01-04
1 2010-01-04
2 2010-01-04
3 2010-01-04
4 2010-01-04
... ...
88282 2019-12-31
88283 2019-12-31
88284 2019-12-31
88285 2019-12-31
88286 2019-12-31
[88287 rows x 1 columns]
[TIME] 0.23171710968017578
date
0 2010-01
1 2010-01
2 2010-01
3 2010-01
4 2010-01
... ...
88282 2019-12
88283 2019-12
88284 2019-12
88285 2019-12
88286 2019-12
[88287 rows x 1 columns]
Gunicorn
takes 3.28 sec
date
0 2010-01-04
1 2010-01-04
2 2010-01-04
3 2010-01-04
4 2010-01-04
... ...
88282 2019-12-31
88283 2019-12-31
88284 2019-12-31
88285 2019-12-31
88286 2019-12-31
[88287 rows x 1 columns]
[TIME] 3.2823679447174072
date
0 2010-01
1 2010-01
2 2010-01
3 2010-01
4 2010-01
... ...
88282 2019-12
88283 2019-12
88284 2019-12
88285 2019-12
88286 2019-12
[88287 rows x 1 columns]
Why did this happen? And what did I do wrong?