For anyone looking for solution to submit training job on schedule,Here I am posting my solution after trying few ways.I tried,
- Run through cloud composer using Airflow
- Start job using start script
- Use cron with Cloud scheduler,Pub/Sub and Cloud function
Easiest and most cost effective way is using cloud scheduler and AI-platform client library with cloud function
step 1 - create pub/sub topic (example start-training
)
step 2 - create cron using cloud scheduler targeting start-training
topic

step 3 - create cloud function using trigger type as cloud pub/sub
and topic as start-training
and entry point is submit_job
function.This function submit a training job to AI-platform through python client library.
Now we have this beautiful DAG
Scheduler -> Pub/Sub -> Cloud Function -> AI-platform
cloud function code goes like this
main.py
import datetime
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials
id = '<PROJECT ID>'
bucket_name = "<BUCKET NAME>"
project_id = 'projects/{}'.format(id)
job_name = "training_" + datetime.datetime.now().strftime("%y%m%d_%H%M%S")
def submit_job(event, context):
training_inputs = {
'scaleTier': 'BASIC',
'packageUris': [f"gs://{bucket_name}/package/trainer-0.1.tar.gz"],
'pythonModule': 'trainer.task',
'region': 'asia-northeast1',
'jobDir': f"gs://{bucket_name}",
'runtimeVersion': '2.2',
'pythonVersion': '3.7',
}
job_spec = {"jobId":job_name, "trainingInput": training_inputs}
cloudml = discovery.build("ml" , "v1" ,cache_discovery=False)
request = cloudml.projects().jobs().create(body=job_spec,parent=project_id)
response = request.execute()
requirement.txt
google-api-python-client
oauth2client
Important
make sure to use Project_id not Project_name,otherwise it will give permission error
If you get ImportError:file_cache is unavailable when using oauthclient ....
error use cache_discovery=False
in build function,otherwise leave function to use cache for performance reason.
point to correct GCS location to your source package,in this case my package name is trainer
built and located in package
folder in the bucket and main module is task