0

I'm trying to run my code of machine learning from images using tensorflow in Google CloudML. However, it seems the submitted job can't access to my files in my cloud shell or in GCS. Even though it is working fine in my local machine, I get the following error once I submit my job using the command gcloud from the cloud shell:

ERROR   2017-12-19 13:52:28 +0100       service         IOError: [Errno 2] No such file or directory: '/home/user/pores-project-googleML/trainer/train.txt'

This folder can be found for sure in cloud shell, and I can check it when I type:

ls /home/user/pores-project-googleML/trainer/train.txt

I tried putting my file train.txt in GCS and access to it from my code (by specifying the path gs://my_bucket/my_path), but once the job submitted, I got a 'No such file or directory' error with the corresponding path.

To check where the job I submitted using gcloud is running, I added print(os.getcwd()) in the beginning of my python code trainer/task.py, which printed as a result in the logs: /user_dir. I couldn't find this path using the cloud shell, not even in GCS. So my question is how can I know in which machine my job is running? If it's in a certain container somewhere, how can I access from it to my files using the cloud shell and in GCS?

Before I do all of this, I succesfully completed the 'Image Classification using Flowers Dataset' tutorial.

The command I used to submit my job is:

gcloud ml-engine jobs submit training $JOB_NAME --job-dir $JOB_DIR --packages trainer-0.1.tar.gz --module-name $MAIN_TRAINER_MODULE --region us-central1

where:

TRAINER_PACKAGE_PATH=/home/use/pores-project-googleML/trainer

MAIN_TRAINER_MODULE="trainer.task"

JOB_DIR="gs://pores/AlexNet_CloudML/job_dir/"

JOB_NAME="census$(date +"%Y%m%d_%H%M%S")"
  • You will need to put your files on GCS. IIUC, when you try to access train.txt on GCS you see an error message about `/home/user/ ...`, is that correct? If so, it would seem the logic to your code is still trying to access `/home/user/` rather than the GCS bucket. – rhaertel80 Dec 19 '17 at 15:32
  • I have actually no problem accessing train.txt on GCS. with the command `gsutil ls gs://pores/AlexNet_CloudML`, I get all my files listed, including `train.txt`. When I tried with the GCS path instead of the one in cloud shell, I got `No such file or directory: 'gs://pores/AlexNet_CloudML/train.txt'`. So, using the path in GCS didn't really solve the problem. – ronin_master Dec 19 '17 at 16:08
  • Was the GCS bucket created in the same project that you submitted your training job from? Our service can only access the GCS buckets that were created in the same project where you enabled our service. A side note on cloud shell: Cloud Shell is running on a VM managed by Google, so all the files in Cloud Shell are not accessible to other services. – Guoqing Xu Dec 20 '17 at 00:44
  • To make sure that my GCS bucket is in the same project from which I submitted my job, I did the following: `PROJECT_ID=$(gcloud config list project --format "value(core.project)")` , `BUCKET_NAME=gs://${PROJECT_ID}-jobs` and `gsutil mb -l $REGION gs://$BUCKET_NAME` I then copied all my files to the new bucket and changed the paths in my code to point there. I still got a "No such file or directory" error with the new gs://path – ronin_master Dec 20 '17 at 11:08
  • Can you share your project number and the gcs path with us via sending an email to cloudml-feedback@google.com please? – Guoqing Xu Dec 20 '17 at 14:50
  • It's done. I remain at your disposal for any complementary information. – ronin_master Dec 20 '17 at 16:02
  • I was able to check that my job can actually access GCS, by adding to my code `print(tf.gfile.ListDirectory('gs://pores-187611-jobs/trainer/')`. However, I still can't read the file train.txt which contains the paths and labels to my images. I didn't succeed to do it with `numpy.loadtxt()`, with `open()` neither with `with open():`. When I tried with those commands, I got no such file or directory.. Is it possible to simply read a txt file with tensorflow? If not, is it possible to do with another library that can access to GCS?? Many thanks !! – ronin_master Dec 21 '17 at 17:41

1 Answers1

1

Regular Python IO library is not able to access files on GCS. Instead, you need to use GCS python client or gstuil cli to access GCS files.

Note that TensorFlow itself has native support of GCS (i.e., it can read GCS files directly).

Guoqing Xu
  • 482
  • 3
  • 9
  • I am trying to use the function open() of the package `cloudstorage`. The problem is that this package works only in Python3, while my job is running under Python 2.7. Now, I get invalid syntax at this line: `def get_driver(driver: DriverName) -> Driver:`. Is there anyway to run my job under Python3? Otherwise, is there another function working on python 2.7 to read my files in GCS?? – ronin_master Dec 22 '17 at 14:39
  • 1
    You can use the gsutil cli by calling os.system('gsutil cp YOUR_GCS_FILE .') to copy your GCA file to the VM. – Guoqing Xu Dec 22 '17 at 18:28
  • Or use the tensorflow.python.lib.io library as suggested in this [post](https://stackoverflow.com/questions/47942299/cloud-ml-unable-to-find-the-file-on-google-cloud-storage) – Guoqing Xu Dec 22 '17 at 18:34
  • Thanks Guoqing Xu! that solves my problem of accessing my data once my job submitted! The Python library I am now using is gcs_client. and it is supported in Python 2.7. – ronin_master Dec 26 '17 at 02:23