I have a NodeJS script, that scrapes URLs everyday. The requests are throttled to be kind to the server. This results in my script running for a fairly long time (several hours).
I have been looking for a way to deploy it on GCP. And because it was previously done in cron, I naturally had a look at how to have a cronjob running on Google Cloud. However, according to the docs, the script has to be exposed as an API and http calls to that API can only run for up to 60 minutes, which doesn't fit my needs.
I had a look at this S.O question, which recommends to use a Cloud Function. However, I am unsure this approach would be suitable in my case, as my script requires a lot more processing than the simple server monitoring job described there.
Has anyone experience in doing this on GCP ?
N.B : To clarify, I want to to avoid deploying it on a VPS.
Edit : I reached out to google, here is their reply :
Thank you for your patience. Currently, it is not possible to run cron script for 6 to 7 hours in a row since the current limitation for cron in App Engine is 60 minutes per HTTP request. If it is possible for your use case, you can spread the 7 hours to recurrring tasks, for example, every 10 minutes or 1 hour. A cron job request is subject to the same limits as those for push task queues. Free applications can have up to 20 scheduled tasks. You may refer to the documentation for cron schedule format.
Also, it is possible to still use Postgres and Redis with this. However, kindly take note that Postgres is still in beta.
As I a can't spread the task, I had to keep on managing a dokku VPS for this.