0

This is quite a sepcific problem but I'm wondering if anyone else has encountered it. I'm using the Whatsapp Cloud API (https://developers.facebook.com/docs/whatsapp/cloud-api/) for a question-answer chatbot. These messages are received by an LLM which takes some time to respond .

Unfortunately in the meantime, the Meta API has sent me the same message a few more times. It seems like unless you almost immediately respond with a 200 status code the Meta API will keep spamming you with the same message (see here under the "Retry" heading and previous stackoverflow answer: WhatsApp cloud API sending old message inbound notification multiple time on my webhook).

What I've tried

My first approach was to use FastAPI's background task functionality. This allows me to immediately return a 200 response and then do the LLM stuff as a background process. This works well in as much as it stops the multiple Whatsapp API calls. However, the LLM is very slow to respond because cloud run presumably does not see the background task and therefore shuts down.

What I would prefer not to try

I know you can set cloud run to be "always on", setting the min CPUs to 1. That would presumably solve the background task problem, but I don't want to pay for a server that's constantly on when I'm not sure how much use it will get. It also kind of defeats the object of cloud run.

I could also have 2 microservices, one to receive the Whatsapp messages and immediately acknowledge receipt, the other would then receive each message and do the LLM stuff. I want to try and avoid this as it's a relatively simple codebase and would prefer not to split out into 2 services.

So.....

Is there any way to have this running as a single service on Cloud Run, while solving the problems I mentioned?

millsy
  • 362
  • 2
  • 3
  • 9
  • As per [this conversation](https://stackoverflow.com/q/70334627/18265570) I believe, `CPU always` cost lower than `CPU only processing request` – Roopa M May 25 '23 at 12:27
  • IMHO Cloud Run is not the correct service to deploy your app on. There are a lot of future factors that you will need to manage (long running requests, request management, timeouts, retries, error and response management, etc.). Because of that use a compute service such as Compute Engine. Stateless environments are not designed to manage that. – John Hanley May 25 '23 at 21:14
  • Why wouldn't be the best option because all the fastapi backend api tutorials are using Cloud Run – Ege Jun 23 '23 at 10:49
  • Yeah just to respond @JohnHanley it's working very well now - the state is just stored in a database and retrieved at the start of the API call. There are some massive advantages to not having to maintain a VM IMO! – millsy Jul 20 '23 at 16:44
  • Cloud Run is one of the best cloud services of all time. I am glad that you solved your problem. – John Hanley Jul 20 '23 at 19:12

1 Answers1

1

To answer my own question.. There is a setting to have CPU always allocated while the container is active (max 15 minutes). See here: https://cloud.google.com/blog/products/serverless/cloud-run-gets-always-on-cpu-allocation

I hadn't realised that this is a different setting to the minimum CPU instances, but it still means the container will be terminated when inactive.

millsy
  • 362
  • 2
  • 3
  • 9
  • Hi, I tried the setting CPU always allocated however it didn't change the response time, it duplicates the responses and reponse after 4-5 mins later. How did you solve that ? – Ege Jun 23 '23 at 10:48
  • My guess would be that your code is not properly running asynchronously - Have you actually tested the response time if you call your api endpoint? It should be in the milliseconds to prevent the Meta API from resending the message. – millsy Jul 20 '23 at 16:42