1

I am trying to build my own image dataset using tf.keras.preprocessing.image_dataset_from_directory.

The images are kept in different folders depending upon their class. The size of an image is 2048 x 2048. There are total 3364 images in all the subfolders.

To create the dataset, I coded the following lines:

def create_dataset():
    img_lib = 'data_util/cls_imgs'
    
    ds_train = tf.keras.preprocessing.image_dataset_from_directory(
        img_lib,
        labels = 'inferred',
        label_mode = 'int',  # Also, tried categorical
        class_names = ['A','B', 'C', 'D', 'E'],
        color_mode = 'rgb',
        shuffle = True,
        batch_size = 256,
        image_size = (64,64),
        seed = 42,
        validation_split = 0.2,
        subset = 'training')

    ds_validation = tf.keras.preprocessing.image_dataset_from_directory(
        img_lib,
        labels = 'inferred',
        label_mode = 'int',  # Also, tried categorical
        class_names = ['A','B', 'C', 'D', 'E'],
        color_mode = 'rgb',
        shuffle = True,
        batch_size = 256,
        image_size = (64,64),
        seed = 42,
        validation_split = 0.2,
        subset = 'validation')
     
     return ds_train, ds_validation

Unfortunately, I am getting the following error:

Found 3364 files belonging to 5 classes.
Traceback (most recent call last):
  File "_mt19937.pyx", line 178, in numpy.random._mt19937.MT19937._legacy_seeding
TypeError: 'float' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "Z:/User Folders/Documents/surface_defect/main.py", line 59, in <module>
    main()
  File "Z:/User Folders/Documents/surface_defect/main.py", line 52, in main
    data_gen(db_path, height, width, color, batch_size, seed, validation_split,
  File "Z:/User Folders/Documents/surface_defect/main.py", line 25, in data_gen
    train_data, test_data = img_data_label.create_dataset()
  File "Z:\User Folders\Documents\surface_defect\data_util\dataset_generation.py", line 61, in create_dataset
    ds_train = tf.keras.preprocessing.image_dataset_from_directory(
  File "C:\Program Files\WinPython64-3.8.6\python-3.8.6.amd64\lib\site-packages\tensorflow\python\keras\preprocessing\image_dataset.py", line 175, in image_dataset_from_directory
    image_paths, labels, class_names = dataset_utils.index_directory(
  File "C:\Program Files\WinPython64-3.8.6\python-3.8.6.amd64\lib\site-packages\tensorflow\python\keras\preprocessing\dataset_utils.py", line 116, in index_directory
    rng = np.random.RandomState(seed)
  File "mtrand.pyx", line 183, in numpy.random.mtrand.RandomState.__init__
  File "_mt19937.pyx", line 166, in numpy.random._mt19937.MT19937._legacy_seeding
  File "_mt19937.pyx", line 186, in numpy.random._mt19937.MT19937._legacy_seeding
TypeError: Cannot cast scalar from dtype('float64') to dtype('int64') according to the rule 'safe'

Please help me understand this problem and remove it.

desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

0

By using the following technique, I was able to solve the problem. This thing is dirty but worked for me.

def create_dataset(self):
    no_of_training_samples = 0
    no_of_testing_samples = 0
    for base, dirs, files in os.walk('data_util/train_test_dir/training_dir'):
        for Files in files:
            no_of_training_samples += 1

    for base, dirs, files in os.walk('data_util/train_test_dir/testing_dir'):
        for Files in files:
            no_of_training_samples += 1

    train = ImageDataGenerator(rescale=1 / 255)
    validate = ImageDataGenerator(rescale=1 / 255)
    test = ImageDataGenerator(rescale=1 / 255)
    training_dataset = train.flow_from_directory('data_util/train_test_dir/training_dir',
                                                 target_size=(self.img_height, self.img_width),
                                                 batch_size=no_of_training_samples,
                                                 class_mode='sparse',
                                                 shuffle=self.shuffle,
                                                 color_mode=self.color_channel,
                                                 seed=self.rand_seed)

    x_train, y_train = next(training_dataset)