How to read files from google bucket in tensorflow?

Question

To train a tensorflow model, I'm loading a custom dataset from a google cloud platform bucket as follows:

GCP_BUCKET = "stereo-train"

tfc.run(
    requirements_txt="requirements.txt",
    chief_config=tfc.MachineConfig(
        cpu_cores=8,
        memory=30,
        accelerator_type=tfc.AcceleratorType.NVIDIA_TESLA_T4,
        accelerator_count=1,
    ),
    docker_image_bucket_name=GCP_BUCKET,
)
kitti = "gs://stereo-train/data_scene_flow"


kitti_train = str(kitti + "/training/dat/data/")

img_height = 375
img_width = 1242

feature_size = 32
batch_size = 6
filenames = np.sort(np.asarray(os.listdir(kitti_train))).tolist()
# Make a Dataset of image tensors by reading and decoding the files.
ds = list(map(lambda x: tf.io.decode_image(tf.io.read_file(kitti_train + x)), filenames))

But the google cloud platform console, gives me the following error:

FileNotFoundError: [Errno 2] No such file or directory: 'gs://stereo-train/data_scene_flow/training/dat/data/'

The stereo-train bucket does exist with the directory hierarchy.

Can you please look in to [this](https://stackoverflow.com/a/62476379/11866104)? — Mahboob, Aug 17 '20 at 16:57

score 2 · Answer 1 · answered Aug 17 '20 at 18:56

2

The tf.io.read_file() method is for local files and doesn't work with the gs:// protocol. Instead you should use tf.io.gfile.GFile().

answered Aug 17 '20 at 18:56

Dustin Ingram

20,502
7
59
82

score 0 · Answer 2 · answered Aug 17 '20 at 06:22

0

If you are running it locally, May be a connection issue, download GCP console and try to access bucket from console.

answered Aug 17 '20 at 06:22

Ankit Mishra

21
4

How to read files from google bucket in tensorflow?

2 Answers2