3

So I designed a CNN and compiled with following parameters,

training_file_loc = "8-SignLanguageMNIST/sign_mnist_train.csv"
testing_file_loc = "8-SignLanguageMNIST/sign_mnist_test.csv"

def getData(filename):
    images = []
    labels = []
    with open(filename) as csv_file:
        file = csv.reader(csv_file, delimiter = ",")
        next(file, None)
        
        for row in file:
            label = row[0]
            data = row[1:]
            img = np.array(data).reshape(28,28)
            
            images.append(img)
            labels.append(label)
        
        images = np.array(images).astype("float64")
        labels = np.array(labels).astype("float64")
        
    return images, labels

training_images, training_labels = getData(training_file_loc)
testing_images, testing_labels = getData(testing_file_loc)

print(training_images.shape, training_labels.shape)
print(testing_images.shape, testing_labels.shape)

training_images = np.expand_dims(training_images, axis = 3)
testing_images = np.expand_dims(testing_images, axis = 3)

training_datagen = ImageDataGenerator(
    rescale = 1/255,
    rotation_range = 45,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = "nearest"
)

training_generator = training_datagen.flow(
    training_images,
    training_labels,
    batch_size = 64,
)


validation_datagen = ImageDataGenerator(
    rescale = 1/255,
    rotation_range = 45,
    width_shift_range = 0.2,
    height_shift_range = 0.2,
    shear_range = 0.2,
    zoom_range = 0.2,
    horizontal_flip = True,
    fill_mode = "nearest"
)

validation_generator = training_datagen.flow(
    testing_images,
    testing_labels,
    batch_size = 64,
)

model = tf.keras.Sequential([
    keras.layers.Conv2D(16, (3, 3), input_shape = (28, 28, 1), activation = "relu"),
    keras.layers.MaxPooling2D(2, 2),
    keras.layers.Conv2D(32, (3, 3), activation = "relu"),
    keras.layers.MaxPooling2D(2, 2),
    keras.layers.Flatten(),
    keras.layers.Dense(256, activation = "relu"),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(512, activation = "relu"),
    keras.layers.Dropout(0.25),
    keras.layers.Dense(26, activation = "softmax")
])

model.compile(
    loss = "categorical_crossentropy",
    optimizer = RMSprop(lr = 0.001),
    metrics = ["accuracy"]
)

But, as I ran the model.fit(), I get the following error,

ValueError: Shapes (None, 1) and (None, 24) are incompatible

After changing the loss function to sparse_categorical_crossentropy, program worked fine.

I don't understand why this happened.

Can anyone explain this and also the difference between those loss functions?

Karan Owalekar
  • 947
  • 10
  • 31
  • here the explanation of the differences between those losses: https://stackoverflow.com/questions/49161174/tensorflow-logits-and-labels-must-have-the-same-first-dimension/62286888#62286888 – Marco Cerliani Jul 21 '20 at 09:09

2 Answers2

2

The issue is, categorical_crossentropy expects one-hot-encoded labels, which means, for each sample it expects a tensor of length num_classes where the labelth element is set to 1 and everything else is 0.

On the other hand, sparse_categorical_crossentropy uses integer labels directly (because the use-case here is a big number of classes, so the one-hot-encoded label would waste memory with a lot of zeros). I believe, but I can't confirm this, that categorical_crossentropy is faster to run than its sparse counterpart.

For your case, with 26 classes I'd recommend using the non-sparse version and transform your labels to be one-hot encoded like so:

def getData(filename):
    images = []
    labels = []
    with open(filename) as csv_file:
        file = csv.reader(csv_file, delimiter = ",")
        next(file, None)
        
        for row in file:
            label = row[0]
            data = row[1:]
            img = np.array(data).reshape(28,28)
            
            images.append(img)
            labels.append(label)
        
        images = np.array(images).astype("float64")
        labels = np.array(labels).astype("float64")
        
    return images, tf.keras.utils.to_categorical(labels, num_classes=26) # you can omit num_classes to have it computed from the data

Side note: unless you have a reason to use float64 for images, I'd switch to float32 (it halves the memory required for the dataset and the model likely converts them to float32 as the first operation anyway)

GPhilo
  • 18,519
  • 9
  • 63
  • 89
  • One more question, as you said it is better to use labels instead of one-hot-encoding because of memory. So when I use "flow_from_directory" to generate labels for image, so does that generate one-hot-encoded labels? – Karan Owalekar Jul 21 '20 at 09:57
  • 2
    I actually said the opposite: Prefer one-hot-encoded unless you have memory issues. flow_from_directory returns the labels according to the class_mode argument (have a look at the docs for all possible values) – GPhilo Jul 21 '20 at 10:00
0

Simple, For the classification problem where your output classes are in integers sparse_categorical_crosentropy, is used and for those where the labels are converted in one hot encoded labels, we use categorical_crosentropy.

Nivesh Gadipudi
  • 486
  • 5
  • 15