1

I'm developing python api server with FastAPI and Uvicorn. But I changed Uvicorn to Gunicorn, I found some functions going very slow. This seems to happen when dealing with large DataFrames.

For examples, dummy.csv has "date" string and almost 88,000 rows. And I want to change string format YYYY-MM-DD to YYYY-MM. So I used pandas.apply() and strftime() for each rows. When I use Uvicorn, that's not cost long time. But with Gunicorn, take almost 10 times longer...

I know about pandas apply() is slow. But I don't know why there is a difference in time spent between the Gunicorn and Uvicorn.

Here's a specific example.

My code

import time
import pandas as pd
from fastapi import FastAPI

app = FastAPI()


@app.get("/pd")
def panda():
    df = pd.read_csv("./dummy.csv")

    df["date"] = pd.to_datetime(df.date)
    print(df)

    start = time.time()
    df["date"] = df["date"].apply(lambda x: x.strftime("%Y-%m"))
    end = time.time()
    print("[TIME]", end - start)

    print(df)

Command

uvicorn main:app
gunicorn -k uvicorn.workers.UvicornWorker main:app --bind 0.0.0.0:8000

Packages

python==3.10.6
fastapi==0.95
pandas==1.4.2
uvicorn==0.19.0
uvloop==0.17.0
gunicorn==20.1.0
...

Result

Uvicorn

takes 0.23 sec

            date
0     2010-01-04
1     2010-01-04
2     2010-01-04
3     2010-01-04
4     2010-01-04
...          ...
88282 2019-12-31
88283 2019-12-31
88284 2019-12-31
88285 2019-12-31
88286 2019-12-31

[88287 rows x 1 columns]
[TIME] 0.23171710968017578
          date
0      2010-01
1      2010-01
2      2010-01
3      2010-01
4      2010-01
...        ...
88282  2019-12
88283  2019-12
88284  2019-12
88285  2019-12
88286  2019-12

[88287 rows x 1 columns]

Gunicorn

takes 3.28 sec

            date
0     2010-01-04
1     2010-01-04
2     2010-01-04
3     2010-01-04
4     2010-01-04
...          ...
88282 2019-12-31
88283 2019-12-31
88284 2019-12-31
88285 2019-12-31
88286 2019-12-31

[88287 rows x 1 columns]
[TIME] 3.2823679447174072
          date
0      2010-01
1      2010-01
2      2010-01
3      2010-01
4      2010-01
...        ...
88282  2019-12
88283  2019-12
88284  2019-12
88285  2019-12
88286  2019-12

[88287 rows x 1 columns]

Why did this happen? And what did I do wrong?

Wapar
  • 29
  • 6
  • [This answer](https://stackoverflow.com/a/70719352/17865804) might not explain the behaviour described in your question, but would give you a couple of solutions when dealing with large daasets. – Chris Apr 20 '23 at 11:33

1 Answers1

0

This will improve the performance of your code, but might not answer your question directly. Try to measure the performance after this change:

replace this line

df["date"] = df["date"].apply(lambda x: x.strftime("%Y-%m"))
df["date"] = df["date"].dt.strftime("%Y-%m")
Eyad Sibai
  • 811
  • 7
  • 21