0

I have a machine learning model inside a docker image. I pushed the docker image to google container registry and then deploy it inside a Kubernetes pod. There is a fastapi application that runs on Port 8000 and this Fastapi endpoint is public (call it mymodel:8000).

The structure of fastapi is :

app.get("/homepage")
asynd def get_homepage()

app.get("/model):
aysnc def get_modelpage()

app.post("/model"):
async def get_results(query: Form(...))

User can put query and submit them and get results from the machine learning model running inside the docker. I want to limit the number of times a query can be made by all the users combined. So if the query limit is 100, all the users combined can make only 100 queries in total.

I thought of a way to do this:

Store a database that stores the number of times GET and POST method has been called. As soon as the total number of times POST has been called crosses the limit, stop accepting any more queries.

Is there an alternative way of doing this using Kubernetes limits? Such as I can define a limit_api_calls such that the total number of times mymodel:8000 is accessed is at max equal to limit_api_calls.

I looked at the documentation and I could only find setting limits for CPUs, Memory and rateLimits.

  • 2
    So you want your API to just stop working after 100 calls? Do you want K8S to clean up the deployment or do you want FastAPI to return a HTTPException? What exactly do you expect to happen after 100 calls and when do you want to resume normal operations? – JarroVGIT Oct 18 '22 at 09:01
  • 1
    Does this answer your question? [FastAPI and SlowAPI limit request under all “path/\*”](https://stackoverflow.com/questions/71180148/fastapi-and-slowapi-limit-request-under-all-path) – Chris Oct 18 '22 at 10:00
  • If K8S could throw a warning that you have exhausted the total number of API calls and then clean up the deployment would be desirable. – shubh gupta Oct 18 '22 at 10:00
  • @Chris yes that is a good way of achieving the logic inside fastapi. I was looking for a way to achieve this using Kubernetes Limits – shubh gupta Oct 18 '22 at 10:37

1 Answers1

1

There are several approaches that could satisfy your needs.

  • Custom implementation: As you mentioned, keep in a persistence layer the number of API calls received and deny requests after it has been reached.
  • Use a service mesh: Istio (for instance) will let you limit the number of requests received and act as a circuit breaker.
  • Use an external Api Manager: Apigee will also let you limit and even charge your users, however if it is only for internal use (not pay per use) I definitely won't recommend it.

The tricky part is what you want to happen after the limit has been reached, if it is just a pod you may exit the application to finish and clear it.

Otherwise, if you have a deployment with its replica set and several resources associated with it (like configmaps), you probably want to use some kind of asynchronous alert or polling check to clean up everything related to your deployment. You may want to have a deep look at orchestrators like Airflow (Composer) and use several tools such as Helm for keeping deployments easy.

Sai Chandra Gadde
  • 2,242
  • 1
  • 3
  • 15
CarlesCF
  • 195
  • 8