2

I created a training job where I fetch my data from big query, perform training and deploy model. I would like to start training automatically in this two cases:

  1. More than 1000 new rows added to the dataset
  2. With a schedule (Ex, once a week)

I checked GCP Cloud Scheduler, but it seems its not suitable for my case.

Ilkin
  • 386
  • 3
  • 17

2 Answers2

4

Cloud Scheduler is the right tool to trigger your training on a schedule. I don't know what your blocker is!!

For your first point, you can't. You can't put a trigger (on BigQuery or on other database) to send an event after X new rows. For this, I recommend you to do this:

  • Schedule a job with Cloud Scheduler (for example every 10 minutes)
  • The job perform a request in BigQuery and check the number of line since the last training job (the date of the last training job must be somewhere, I recommend in another BigQuery table)
    • If the number of line is > 1000, trigger your running job
    • Else, exit the function

As you see, it's not so easy and there is several caveats:

  • When you deploy your model, you also have to write the date of the latest training
  • You have to perform several times the request into BigQuery. Partition correctly your table for limiting the cost

Does it make sense for you?

EDIT

gcloud command is a "simple" wrapper of API calls. Try to add the param --http-log to your gcloud command to see which API is called and with which params.

Anyway, you can start a job by calling this API, and if you want and example, use the --http-log param of gcloud SDK!

guillaume blaquiere
  • 66,369
  • 2
  • 47
  • 76
  • Cloud Scheduler targets are HTTP, Pub/Sub and App Engine HTTP but in order to create AI-Platform Jobs, I need gcloud command. – Ilkin Jun 29 '20 at 08:38
  • No, you needn't. I edited my answer with more details. – guillaume blaquiere Jun 29 '20 at 08:49
  • 1
    I didn't know that part, I think this is the answer, thanks a lot – Ilkin Jun 29 '20 at 08:58
  • 1
    Did you manage to run this? We're trying to deploy something very similar, but are running into a blocker: we configured Google Scheduler to run the training job via the REST API... but then, job training requires a unique job name. is there a way to provide such unique name from the Scheduler? – Germán Sanchis Jul 29 '20 at 08:25
4

For anyone looking for solution to submit training job on schedule,Here I am posting my solution after trying few ways.I tried,

  • Run through cloud composer using Airflow
  • Start job using start script
  • Use cron with Cloud scheduler,Pub/Sub and Cloud function

Easiest and most cost effective way is using cloud scheduler and AI-platform client library with cloud function

step 1 - create pub/sub topic (example start-training)

step 2 - create cron using cloud scheduler targeting start-training topic

enter image description here

step 3 - create cloud function using trigger type as cloud pub/sub and topic as start-training and entry point is submit_job function.This function submit a training job to AI-platform through python client library.

Now we have this beautiful DAG

Scheduler -> Pub/Sub -> Cloud Function -> AI-platform

cloud function code goes like this

main.py

import datetime
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

id = '<PROJECT ID>'
bucket_name = "<BUCKET NAME>"
project_id = 'projects/{}'.format(id)
job_name = "training_" + datetime.datetime.now().strftime("%y%m%d_%H%M%S")

def submit_job(event, context):

     training_inputs = {
     'scaleTier': 'BASIC',
     'packageUris': [f"gs://{bucket_name}/package/trainer-0.1.tar.gz"],
     'pythonModule': 'trainer.task',
     'region': 'asia-northeast1',
     'jobDir': f"gs://{bucket_name}",
     'runtimeVersion': '2.2',
     'pythonVersion': '3.7',
          }

     job_spec = {"jobId":job_name, "trainingInput": training_inputs}
     cloudml = discovery.build("ml" , "v1" ,cache_discovery=False)
     request = cloudml.projects().jobs().create(body=job_spec,parent=project_id)
     response = request.execute()

requirement.txt

google-api-python-client
oauth2client

Important

  • make sure to use Project_id not Project_name,otherwise it will give permission error

  • If you get ImportError:file_cache is unavailable when using oauthclient .... error use cache_discovery=False in build function,otherwise leave function to use cache for performance reason.

  • point to correct GCS location to your source package,in this case my package name is trainer built and located in package folder in the bucket and main module is task

Rajith Thennakoon
  • 3,975
  • 2
  • 14
  • 24
  • Hi! Do you know have to pass custom arguments in this method? I am using custom dockers and would like to pass `--epochs 5` as outlined here: https://cloud.google.com/ai-platform/training/docs/custom-containers-training – dendog Oct 19 '21 at 13:12