I am trying to use an existing TensorFlow model, which I have so far run locally, with Google Cloud ML Engine.
The model currently obtains its training data by passing filesnames such as my_model.train
and my_model.eval
into tf.data.TextLineDataset
. These filenames are now hardcoded in the model's trainer, but I plan to refactor it such that it obtains them as training application parameters (along with --job-dir
) on the command line instead; e.g. like so:
my_trainer.pl --job-dir job \
--filename-train my_model.train --filename-eval my_model.eval
This should then also allow me to run the trainer with Clould ML Engine locally:
gcloud ml-engine local train \
--job-dir job
...
-- \
--filename-train my_model.train \
--filename-eval my_model.eval
Am I making correct assumptions so far and could I also run the same trainer in Google's cloud (after uploading my dataset files into my_bucket
) by replacing local filenames with Google Cloud Storage gs:
URIs e.g. like so :
gcloud ml-engine local train \
--job-dir job
...
-- \
--filename-train gs://my_bucket/my_model.train \
--filename-eval gs://my_bucket/my_model.eval
I other worlds, can tf.data.TextLineDataset
handle gs:
URIs as "filenames" transparently, or do I have to include special code in my trainer for processing such URIs beforehand?