2

I'm facing this issue where i have a python script that needs to be run every day at 00:00am on google cloud, possibly using Google Cloud Run, what I'd like to know is something quite specific to which i couldn't find a good answer to, the thing is... Which way is technically best to achieve this? is it better to let the cloud trigger a certain script at certain times? or is it better to have an always running container which waits (using locks) for a certain time of the day to come, then runs a function as consequence. The task the script faces is something quite heavy, it scans for pictures and tries to get plain text out of it (images are downloaded from an instagram page).

As i've never implemented such thing in a cloud environment what i need to know boils down to:
How heavier can be a "lock waiting" script vs a cloud handled scheduler (e.g. Google Cloud Scheduler), economically speaking does it matter anything when doing such heavy tasks like the ones in my script?

2 Answers2

2

I think a Cloud Scheduler may a be a good first solution/approach. It can, for example, make some http request, or push a message into a pubsub (which can be used as a pull or push trigger for your script).

Under the script I understand any required functionality. It can be implemented in many different ways - Cloud Function (or a group of different Cloud Functions working together to archive one goal), a Cloud Run, or anything else.

My usual personal preference is a pattern Cloud Scheduler => PubSub Topic => push Cloud Function. Other people may prefer other variations.

The choice of the solution (including the "script" implementation) in your case - I think - depends on functional and non functional requirements, context, scope, skills and knowledge of people who are to develop and maintain the solution, time, CAPEX and OPEX budget, etc.

al-dann
  • 2,545
  • 1
  • 12
  • 22
  • So what you are suggesting here is basically having a cloud function to trigger some app/container (via http/pubsub) that runs the *to be scheduled* script, am i right? but if i have to make my script to always listen for e.g an http call, isn't it just better to schedule the script using, in case of python, something like https://pypi.org/project/schedule/? many thanks. – Roberto Montalti Nov 03 '21 at 18:32
  • Not really... The idea is that the cloud function should not trigger something (i.e script, container, etc.), but the functional script is to be implemented as a cloud function (or a group of cloud functions working together). Or a script is to be implemented as a cloud run... and so on. in nutshell, the "script" (or whatever functionality is required) is to be triggered from the pub/sub topic - either using pull subscription or push. – al-dann Nov 03 '21 at 19:12
  • Thanks a lot, now it's much clearer and i think it's the solution i will adopt for this and future projects. – Roberto Montalti Nov 06 '21 at 12:38
1

Don't know if this is the best technically but I would go with a combination of Cloud Run and Cloud Scheduler (we currently have this combination running for one of our projects).

Cloud Run because your script seems to run just once a day and Cloud Run will basically go to sleep when it is not serving a request. This makes for lower overall cost i.e. it wakes up when it receives a request, executes the request and goes back to sleep (no charge to you when it is sleeping).

Cloud Scheduler to trigger the url endpoint on Cloud Run at 00:00am. As the name implies - Scheduler - schedules jobs to run at specific times.

I would also suggest securing your url endpoint (the one that will be deployed on Cloud Run). This ensures only your Cloud Scheduler is the one triggering the url (someone can not 'mistakenly' access the url over the internet and trigger the job unless they have the necessary privilege). We have a blog article about how to do this.

NoCommandLine
  • 5,044
  • 2
  • 4
  • 15