If I understand well, you have an application X that should connect to an API Y and X can’t send more than 5 requests per second to Y.
This is a complex and incomplete scenario. Let me ask few questions
- What is the expected load on X? If it is below 5 requests per seconds… it is ok
- What is the timeout on X? Imagine that you received 50 requests per second… on this scenario you may need 10 seconds to answer some requests, is it ok?
- In the case of a timeout in X, the client will just retry?
- What happens if you call Y more than 5 requests per second?
- Is the response from Y is cacheable?
- Do you have multiple servers / autoscale?
One possibility is to set a rate limiter on the application to match the limit on the API.
Another is just call the API as much as you can. If it fail because too much requests you can implement a retry logic or give up.
If you need to be very careful with this API for some reason, and you don’t need to run multiple instances / autoscale, the solution is use a rate limiter on the application.
If you need to run several instances you need something that centralizes the access to this API and this is a very delicate thing… it is a single point of failure. You can implement one token system that only delivers 5 tokens per second. Once you have a token you can access the API. It is one possibility.
There is no free lunch. Each solution has pros and cons. But if you can avoid perform requests to the API (like caching the results) ou add messages to a queue if you only need to store the data (and run an async program)… perhaps will be easier to discuss q better solution