Problem in creating own image dataset using tf.keras.preprocessing.image_dataset_from_directory

Question

I am trying to build my own image dataset using tf.keras.preprocessing.image_dataset_from_directory.

The images are kept in different folders depending upon their class. The size of an image is 2048 x 2048. There are total 3364 images in all the subfolders.

To create the dataset, I coded the following lines:

def create_dataset():
    img_lib = 'data_util/cls_imgs'
    
    ds_train = tf.keras.preprocessing.image_dataset_from_directory(
        img_lib,
        labels = 'inferred',
        label_mode = 'int',  # Also, tried categorical
        class_names = ['A','B', 'C', 'D', 'E'],
        color_mode = 'rgb',
        shuffle = True,
        batch_size = 256,
        image_size = (64,64),
        seed = 42,
        validation_split = 0.2,
        subset = 'training')

    ds_validation = tf.keras.preprocessing.image_dataset_from_directory(
        img_lib,
        labels = 'inferred',
        label_mode = 'int',  # Also, tried categorical
        class_names = ['A','B', 'C', 'D', 'E'],
        color_mode = 'rgb',
        shuffle = True,
        batch_size = 256,
        image_size = (64,64),
        seed = 42,
        validation_split = 0.2,
        subset = 'validation')
     
     return ds_train, ds_validation

Unfortunately, I am getting the following error:

Found 3364 files belonging to 5 classes.
Traceback (most recent call last):
  File "_mt19937.pyx", line 178, in numpy.random._mt19937.MT19937._legacy_seeding
TypeError: 'float' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Z:/User Folders/Documents/surface_defect/main.py", line 59, in <module>
    main()
  File "Z:/User Folders/Documents/surface_defect/main.py", line 52, in main
    data_gen(db_path, height, width, color, batch_size, seed, validation_split,
  File "Z:/User Folders/Documents/surface_defect/main.py", line 25, in data_gen
    train_data, test_data = img_data_label.create_dataset()
  File "Z:\User Folders\Documents\surface_defect\data_util\dataset_generation.py", line 61, in create_dataset
    ds_train = tf.keras.preprocessing.image_dataset_from_directory(
  File "C:\Program Files\WinPython64-3.8.6\python-3.8.6.amd64\lib\site-packages\tensorflow\python\keras\preprocessing\image_dataset.py", line 175, in image_dataset_from_directory
    image_paths, labels, class_names = dataset_utils.index_directory(
  File "C:\Program Files\WinPython64-3.8.6\python-3.8.6.amd64\lib\site-packages\tensorflow\python\keras\preprocessing\dataset_utils.py", line 116, in index_directory
    rng = np.random.RandomState(seed)
  File "mtrand.pyx", line 183, in numpy.random.mtrand.RandomState.__init__
  File "_mt19937.pyx", line 166, in numpy.random._mt19937.MT19937._legacy_seeding
  File "_mt19937.pyx", line 186, in numpy.random._mt19937.MT19937._legacy_seeding
TypeError: Cannot cast scalar from dtype('float64') to dtype('int64') according to the rule 'safe'

Please help me understand this problem and remove it.

You forgot to add the comma after setting the arguments `color_mode`, `shuffle`, `batch_size`. — yudhiesh, Apr 24 '21 at 07:30
Thanks @yudhiesh. appologies for that error. I manuallly wrote the script here and forgot the comma. In actual code, it is there. — Raj Rajeshwari Prasad, Apr 24 '21 at 07:38
Ok could you include the folder structure of `img_lib`? Also include the version of Keras and Tensorflow you are using. — yudhiesh, Apr 24 '21 at 07:45
I am really sorry. Can't do that due to company policy. There are 5 folders inside the data_util/cls_imgs. And each folder contains a number of images. In total there are 3364 images. — Raj Rajeshwari Prasad, Apr 24 '21 at 07:53
Can this issue be related to images? I don't think it will be related to the folder structure. or is there any fundamental thing about this method that I don't know? — Raj Rajeshwari Prasad, Apr 24 '21 at 09:11
Try using the method I answered in this [question](https://stackoverflow.com/questions/64531236/unidentifiedimageerror-when-training-a-model-using-tf-imagegenerator/64531424#64531424). — yudhiesh, Apr 24 '21 at 11:53
Appologies for late reply. but the solution to your post also didnt worked for the problem. Instead, I used a different technique to solve the problem. — Raj Rajeshwari Prasad, Apr 27 '21 at 06:16

score 0 · Accepted Answer · answered Apr 27 '21 at 06:19

By using the following technique, I was able to solve the problem. This thing is dirty but worked for me.

def create_dataset(self):
    no_of_training_samples = 0
    no_of_testing_samples = 0
    for base, dirs, files in os.walk('data_util/train_test_dir/training_dir'):
        for Files in files:
            no_of_training_samples += 1

    for base, dirs, files in os.walk('data_util/train_test_dir/testing_dir'):
        for Files in files:
            no_of_training_samples += 1

    train = ImageDataGenerator(rescale=1 / 255)
    validate = ImageDataGenerator(rescale=1 / 255)
    test = ImageDataGenerator(rescale=1 / 255)
    training_dataset = train.flow_from_directory('data_util/train_test_dir/training_dir',
                                                 target_size=(self.img_height, self.img_width),
                                                 batch_size=no_of_training_samples,
                                                 class_mode='sparse',
                                                 shuffle=self.shuffle,
                                                 color_mode=self.color_channel,
                                                 seed=self.rand_seed)

    x_train, y_train = next(training_dataset)

Problem in creating own image dataset using tf.keras.preprocessing.image_dataset_from_directory

1 Answers1