1

I am working in Python with Google Cloud ML-Engine. The documentation I have found indicates that data storage should be done with Buckets and Blobs

https://cloud.google.com/ml-engine/docs/tensorflow/working-with-cloud-storage

However, much of my code, and the libraries it calls works with files. Can I somehow treat Google Storage as a file system in my ml-engine code?

I want my code to read like

with open(<something>) as f:
   for line in f:
      dosomething(line)

Note that in ml-engine one does not create and configure VM instances. So I can not mount my own shared filesystem with Filestore.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
opus111
  • 2,744
  • 4
  • 25
  • 41

2 Answers2

2

The only way to have Cloud Storage appear as a filesystem is to mount a bucket as a file system:

You can use the Google Cloud Storage FUSE tool to mount a Cloud Storage bucket to your Compute Engine instance. The mounted bucket behaves similarly to a persistent disk even though Cloud Storage buckets are object storage.

But you cannot do that if you can't create and configure VMs.

Note that in ml-engine one does not create and configure VM instances.

That's not entirely true. I see ML Engine supports building custom containers, which is typically how one can install and configure OS-level dependencies. But only for the training area, so if your needs are in that area it may be worth a try.

I assume you already checked that the library doesn't support access through an already open file-like handler (if not then maybe of interest would be How to restore Tensorflow model from Google bucket without writing to filesystem?)

Dan Cornilescu
  • 39,470
  • 12
  • 57
  • 97
  • Thank you Dan. I already tried something like open("gs://bucketname/foo.bar") and the package google.cloud.storage does not seem to have an open() method. I am surprised because I would expect a normal file-like interface to persistent storage. I feel I myst be missing it – opus111 Mar 14 '19 at 13:11
  • You can use TensorFlow FileIO library to open GCS files. Please follow this link: https://stackoverflow.com/questions/42799117/google-cloud-ml-and-gcs-bucket-issues – Guoqing Xu Mar 14 '19 at 17:38
  • @user1902291 being an actual filesystem file (on which python's default `open()` works) and having a file-like handler obtained through some call other than python's `open()` are two different things... – Dan Cornilescu Mar 14 '19 at 19:55
2

For those that come after, here is the answer

Google Cloud ML and GCS Bucket issues

from tensorflow.python.lib.io import file_io

Here is an example

with file_io.FileIO("gc://bucket_name/foobar.txt","w") as f:
    f.write("FOO")
    f.flush()
    print("Write foobar.txt")

with file_io.FileIO("gc://bucket_name/foobar.txt","r") as f:
    for line in f:
        print("Read foobar.txt: "+line)
opus111
  • 2,744
  • 4
  • 25
  • 41