This is quite a sepcific problem but I'm wondering if anyone else has encountered it. I'm using the Whatsapp Cloud API (https://developers.facebook.com/docs/whatsapp/cloud-api/) for a question-answer chatbot. These messages are received by an LLM which takes some time to respond .
Unfortunately in the meantime, the Meta API has sent me the same message a few more times. It seems like unless you almost immediately respond with a 200 status code the Meta API will keep spamming you with the same message (see here under the "Retry" heading and previous stackoverflow answer: WhatsApp cloud API sending old message inbound notification multiple time on my webhook).
What I've tried
My first approach was to use FastAPI's background task functionality. This allows me to immediately return a 200 response and then do the LLM stuff as a background process. This works well in as much as it stops the multiple Whatsapp API calls. However, the LLM is very slow to respond because cloud run presumably does not see the background task and therefore shuts down.
What I would prefer not to try
I know you can set cloud run to be "always on", setting the min CPUs to 1. That would presumably solve the background task problem, but I don't want to pay for a server that's constantly on when I'm not sure how much use it will get. It also kind of defeats the object of cloud run.
I could also have 2 microservices, one to receive the Whatsapp messages and immediately acknowledge receipt, the other would then receive each message and do the LLM stuff. I want to try and avoid this as it's a relatively simple codebase and would prefer not to split out into 2 services.
So.....
Is there any way to have this running as a single service on Cloud Run, while solving the problems I mentioned?