0

This is main code which works on CPU machine. It loads all images and masks from folders, resizes them, and save as 2 numpy arrays.

from skimage.transform import resize as imresize
from skimage.io import imread


def create_data(dir_input, img_size):

    img_files = sorted(glob(dir_input + '/images/*.jpg'))
    mask_files = sorted(glob(dir_input + '/masks/*.png'))

    X = []
    Y = []

    for img_path, mask_path in zip(img_files, mask_files):

        img = imread(img_path)
        img = imresize(img, (img_size, img_size), mode='reflect', anti_aliasing=True)

        mask = imread(mask_path)
        mask = imresize(mask, (img_size, img_size), mode='reflect', anti_aliasing=True)

        X.append(img)
        Y.append(mask)



    path_x = dir_input + '/images-{}.npy'.format(img_size)
    path_y = dir_input + '/masks-{}.npy'.format(img_size)

    np.save(path_x, np.array(X))
    np.save(path_y, np.array(Y))


Here is gcloud storage hierarchy

gs://my_bucket
|
|----inputs    
|      |----images/
|      |-----masks/
|   
|----outputs
|
|----trainer    


dir_input should be gs://my_bucket/inputs

This doesn't work. What is the proper way to load images from that path on cloud, and save numpy array in the inputs folder?

Preferable with skimage, which is loaded in setup.py

elektricni
  • 319
  • 1
  • 5
  • 12

1 Answers1

0

Most Python libraries such as numpy don't natively support reading from and writing to object stores like GCS or S3. There are a few options:

  • Copy the data to local disk first (see this answer).
  • Try using the GCS python SDK (docs)
  • Use another library, like TensorFlow's FileIO abstraction. Here's some code similar to what you're trying to do (read/write numpy arrays).

The latter is particularly useful if you are using TensorFlow, but can still be used even if you are using some other framework.

rhaertel80
  • 8,254
  • 1
  • 31
  • 47