I'm researching the possibility of implementing a CNN in order to classify images as "good" or "bad" but am having no luck with my current architecture.
Characteristics that denote a "bad" image:
- Overexposure
- Oversaturation
- Incorrect white balance
- Blurriness
Would it be feasible to implement a neural network to classify images based on these characteristics or is it best left to a traditional algorithm that simply looks at the variance in brightness/contrast throughout an image and classifies it that way?
I have attempted training a CNN using the VGGNet architecture but I always seem to get a biased and unreliable model, regardless of the number of epochs or number of steps.
Examples:
My current model's architecture is very simple (as I am new to the whole machine learning world) but seemed to work fine with other classification problems, and I have modified it slightly to work better with this binary classification problem:
# CONV => RELU => POOL layer set
# define convolutional layers, use "ReLU" activation function
# and reduce the spatial size (width and height) with pool layers
model.add(Conv2D(32, (3, 3), padding="same", input_shape=input_shape)) # 32 3x3 filters (height, width, depth)
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 2 => POOL layer set (increasing number of layers as you go deeper into CNN)
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# (CONV => RELU) * 3 => POOL layer set (input volume size becoming smaller and smaller)
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
model.add(Activation("relu"))
model.add(BatchNormalization(axis=channel_dimension))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)
# only set of FC => RELU layers
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(BatchNormalization())
model.add(Dropout(0.5))
# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))
Is there any glaring omissions or mistakes with this model or can I simply not solve this problem using deep learning (with my current GPU, a GTX 970)?
Thanks for your time and experience,
Josh
EDIT: Here is my code for compiling/training the model:
# initialise the model and optimiser
print("[INFO] Training network...")
opt = SGD(lr=initial_lr, decay=initial_lr / epochs)
model.compile(loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"])
# set up checkpoints
model_name = "output/50_epochs_{epoch:02d}_{val_acc:.2f}.model"
checkpoint = ModelCheckpoint(model_name, monitor='val_acc', verbose=1,
save_best_only=True, mode='max')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
patience=5, min_lr=0.001)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
callbacks_list = [checkpoint, reduce_lr, tensorboard]
# train the network
H = model.fit_generator(training_set, steps_per_epoch=500, epochs=50, validation_data=test_set, validation_steps=150, callbacks=callbacks_list)