35

What is the difference between deploying FastAPI apps dockerized using Uvicorn and Tiangolo's Gunicorn+Uvicorn? And why do my results show that I get a better result when deploying only using Uvicorn than Gunicorn+Uvicorn?

When I searched in Tiangolo's documentation, it says:

You can use Gunicorn to manage Uvicorn and run multiple of these concurrent processes. That way, you get the best of concurrency and parallelism.

From this, can I assume that using this Gunicorn will get a better result?

This is my testing using JMeter. I deployed my script to Google Cloud Run, and this is the result:

Using Python and Uvicorn:

enter image description here

Using Tiangolo's Gunicorn+Uvicorn:

enter image description here

This is my Dockerfile for Python (Uvicorn):

FROM python:3.8-slim-buster
RUN apt-get update --fix-missing
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y libgl1-mesa-dev python3-pip git
RUN mkdir /usr/src/app
WORKDIR /usr/src/app
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip3 install -U setuptools
RUN pip3 install --upgrade pip
RUN pip3 install -r ./requirements.txt --use-feature=2020-resolver
COPY . /usr/src/app
CMD ["python3", "/usr/src/app/main.py"]

This is my Dockerfile for Tiangolo's Gunicorn+Uvicorn:

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.8-slim
RUN apt-get update && apt-get install wget gcc -y
RUN mkdir -p /app
WORKDIR /app
COPY ./requirements.txt /app/requirements.txt
RUN python -m pip install --upgrade pip
RUN pip install --no-cache-dir -r /app/requirements.txt
COPY . /app

You can see the error from Tiangolo's Gunicorn+Uvicorn. Is it caused by Gunicorn?

Edited.

So, in my case, I using lazy load method to load my Machine Learning model. This is my class to load the model.

class MyModelPrediction:
    # init method or constructor
    def __init__(self, brand):
        self.brand = brand

    # Sample Method
    def load_model(self):
        pathfile_model = os.path.join("modules", "model/")
        brand = self.brand.lower()
        top5_brand = ["honda", "toyota", "nissan", "suzuki", "daihatsu"]

        if brand not in top5_brand:
            brand = "ex_Top5"
            with open(pathfile_model + f'{brand}_all_in_one.pkl', 'rb') as file:
                brand = joblib.load(file)
        else:
            with open(pathfile_model + f'{brand}_all_in_one.pkl', 'rb') as file:
                brand = joblib.load(file)

        return brand

And, this is my endpoint for my API.

@router.post("/predict", response_model=schemas.ResponsePrediction, responses={422: schemas.responses_dict[422], 400: schemas.responses_dict[400], 500: schemas.responses_dict[500]}, tags=["predict"], response_class=ORJSONResponse)
async def detect(
    *,
    # db: Session = Depends(deps.get_db_api),
    car: schemas.Car = Body(...),
    customer_id: str = Body(None, title='Customer unique identifier')
) -> Any:
    """
    Predict price for used vehicle.\n
    """
    global list_detections
    try:
        start_time = time.time()
        brand = car.dict()['brand']
        obj = MyModelPrediction(brand)

        top5_brand = ["honda", "toyota", "nissan", "suzuki", "daihatsu"]
        if brand not in top5_brand:
            brand = "non"

        if usedcar.price_engine_4w[brand]:
            pass
        else:
            usedcar.price_engine_4w[brand] = obj.load_model()
            print("Load success")

        elapsed_time = time.time() - start_time
        print(usedcar.price_engine_4w)
        print("ELAPSED MODEL TIME : ", elapsed_time)

        list_detections = await get_data_model(**car.dict())

        if list_detections is None:
            result_data = None
        else:
            result_data = schemas.Prediction(**list_detections)
            result_data = result_data.dict()

    except Exception as e:  # noqa
        raise HTTPException(
            status_code=500,
            detail=str(e),
        )
    else:
        if result_data['prediction_price'] == 0:
            raise HTTPException(
                status_code=400,
                detail="The system cannot process your request",
            )
        else:
            result = {
                'code': 200,
                'message': 'Successfully fetched data',
                'data': result_data
            }

    return schemas.ResponsePrediction(**result)
MADFROST
  • 1,043
  • 2
  • 11
  • 29
  • 1
    What is the average/min/max values in your table? Response times..? – Gino Mempin Feb 26 '21 at 09:02
  • @Gino Yes, it is average/min/max response times that i got – MADFROST Feb 27 '21 at 03:03
  • Can you also post a sample endpoint that you used for testing? I assume your tests is hitting some endpoint many times. Also, might help to understand that Gunicorn is to parallelize your processes, using workers. It is different from concurrency. See https://fastapi.tiangolo.com/async/ – Gino Mempin Feb 27 '21 at 07:57
  • @GinoMempin I've updated my question, so in my endpoint using Async def here. So async def is not suitable for the parallel process? – MADFROST Feb 27 '21 at 09:05
  • @GinoMempin Okay, when I read the documentation, it was explained that if you are using the Machine Learning model, you can use [Concurrency + Parallelism](https://fastapi.tiangolo.com/async/#concurrency-parallelism-web-machine-learning), it said to use `async def` and `await`. if we refer to [tiangolo-gunicorn-uvicorn](https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker#gunicorn) as we explain before, should go well right? – MADFROST Feb 27 '21 at 11:15
  • Read https://stackoverflow.com/help/minimal-reproducible-example. – aaron Oct 03 '21 at 09:26

1 Answers1

46

Gunicorn is an app server that interacts with your web-application using the WSGI protocol. This means that Gunicorn can serve applications written in synchronous web-frameworks such as Flask or Django (more so for versions released before 2021). The way it works is that it creates and maintains their operability a configurable number of instances of your application (workers) that serve HTTP requests from clients. The role of the Gunicorn master process is to make sure that the number of workers is the same as the ones defined in the settings. So if any of the workers die, the master process starts another one. Gunicorn itself is not compatible with FastAPI because FastAPI uses the fresh ASGI standard.

Uvicorn is an app server supports the ASGI protocol. However, it's capabilities as a workers manager leave much to be desired.

But Uvicorn has a Gunicorn-compatible worker class. Using that combination, Gunicorn would act as a workers manager and a server accepting incoming HTTP requests. When requests arrive to the workers, this layer will ensure ASGI compatibility when passing data to your application.

If you have a cluster of machines with Kubernetes, Docker Swarm or another similar complex system to manage distributed containers on multiple machines, then you will probably want to handle replication at the cluster level instead of using a process manager (like Gunicorn with workers) in each container. One of those distributed container management systems like Kubernetes normally has some integrated way of handling replication of containers while still supporting load balancing for the incoming requests. All at the cluster level. In those cases, you would probably want to build a Docker image from scratch, installing your dependencies, and running a single Uvicorn process instead of running something like Gunicorn with Uvicorn workers.

zo0M
  • 972
  • 11
  • 20
  • 6
    And the issue of ASGI support in gunicorn remains open for 6 years now ([here](https://github.com/benoitc/gunicorn/issues/1380) it is should anyone wish to contribute:) – mirekphd Dec 17 '22 at 08:44
  • 2
    Please note however that k8s pods are not really practical substitutes for gunicorn workers - to name one problem: the limits on the number of pods that a node can handle are much lower (250-1000) than the number of processes per user (up to 4.2 million). – mirekphd Dec 17 '22 at 08:54
  • @mirekphd And gunicorn is not alone. In 2023, there are still blocking calls in the standard library with no `async` alternatives. I guess this says more about the `async` design itself than about tools like gunicorn. – satoru Feb 13 '23 at 12:32
  • @mirekphd Your example was right. But I think it doesn't take too many meanings. Fristly we won't use so many processes that it will cause trashing. Secondly we cloud have many nodes for a webserver deployment, so we can reach 1 million processes finally. – whitehatboxer Aug 25 '23 at 06:58