CNN architecture: classifying "good" and "bad" images

Question

I'm researching the possibility of implementing a CNN in order to classify images as "good" or "bad" but am having no luck with my current architecture.

Characteristics that denote a "bad" image:

Overexposure
Oversaturation
Incorrect white balance
Blurriness

Would it be feasible to implement a neural network to classify images based on these characteristics or is it best left to a traditional algorithm that simply looks at the variance in brightness/contrast throughout an image and classifies it that way?

I have attempted training a CNN using the VGGNet architecture but I always seem to get a biased and unreliable model, regardless of the number of epochs or number of steps.

Examples:

My current model's architecture is very simple (as I am new to the whole machine learning world) but seemed to work fine with other classification problems, and I have modified it slightly to work better with this binary classification problem:

    # CONV => RELU => POOL layer set
    # define convolutional layers, use "ReLU" activation function
    # and reduce the spatial size (width and height) with pool layers
    model.add(Conv2D(32, (3, 3), padding="same", input_shape=input_shape)) # 32 3x3 filters (height, width, depth)
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)

    # (CONV => RELU) * 2 => POOL layer set (increasing number of layers as you go deeper into CNN)
    model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(Conv2D(64, (3, 3), padding="same", input_shape=input_shape)) # 64 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)

    # (CONV => RELU) * 3 => POOL layer set (input volume size becoming smaller and smaller)
    model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(Conv2D(128, (3, 3), padding="same", input_shape=input_shape)) # 128 3x3 filters
    model.add(Activation("relu"))
    model.add(BatchNormalization(axis=channel_dimension))
    model.add(MaxPooling2D(pool_size=(2,2)))
    model.add(Dropout(0.25)) # helps prevent overfitting (25% of neurons disconnected randomly)

    # only set of FC => RELU layers
    model.add(Flatten())
    model.add(Dense(512))
    model.add(Activation("relu"))
    model.add(BatchNormalization())
    model.add(Dropout(0.5))

    # sigmoid classifier (output layer)
    model.add(Dense(classes))
    model.add(Activation("sigmoid"))

Is there any glaring omissions or mistakes with this model or can I simply not solve this problem using deep learning (with my current GPU, a GTX 970)?

Thanks for your time and experience,

Josh

EDIT: Here is my code for compiling/training the model:

# initialise the model and optimiser
print("[INFO] Training network...")
opt = SGD(lr=initial_lr, decay=initial_lr / epochs)
model.compile(loss="sparse_categorical_crossentropy", optimizer=opt, metrics=["accuracy"])

# set up checkpoints
model_name = "output/50_epochs_{epoch:02d}_{val_acc:.2f}.model"
checkpoint = ModelCheckpoint(model_name, monitor='val_acc', verbose=1, 
save_best_only=True, mode='max')
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,
                          patience=5, min_lr=0.001)
tensorboard = TensorBoard(log_dir="logs/{}".format(time()))
callbacks_list = [checkpoint, reduce_lr, tensorboard]

# train the network
H = model.fit_generator(training_set, steps_per_epoch=500, epochs=50, validation_data=test_set, validation_steps=150, callbacks=callbacks_list)

I think it's overfitting as, regardless of what input I use to predict from, the results are really biased and very similar. It has exhibited this behaviour regardless of the number of epochs or number of steps which leads me to think that it is a problem with the architecture. For example, with three different images, my results are 0.987 bad and 0.999 good (to 3 significant figures). — Josh Newham, Sep 15 '19 at 11:10
I don't follow; what is "0.987 bad and 0.999 good"? What is your *accuracy*? — desertnaut, Sep 15 '19 at 11:14
Ahhh, sorry. My accuracy was about 80% by the end of training my 50 epoch model. I was giving you the predictions from the model previously... — Josh Newham, Sep 15 '19 at 11:16
Please **include** this info in your post! Assuming we are talking about the *validation* accuracy (pls clarify this, too). This is the starting point for diagnosing possible problems, and not verbal statements ("*biased and unreliable*) or assumptions (*I think it's overfittin*)... — desertnaut, Sep 15 '19 at 11:20
Thanks for your quick response. It was validation accuracy, yes. The original purpose of my post was to ascertain whether it was even feasible to solve this problem using a neural network, I wasn't expecting any instant solutions, but I am extremely grateful for everyone's time and I'm sorry for not being more specific in the post. My metrics looked good and realistic with a training loss value of 0.21 and training accuracy of 0.91 but the predictions were completely unexpected. — Josh Newham, Sep 15 '19 at 11:27
If `classes=2`, as I suspect, you should not use `sigmoid` in your final layer - see answer — desertnaut, Sep 15 '19 at 11:29
@JoshNewham may I ask what's the status of your research on this area? is there any github repo available to look at? Really interesting thread btw! — BPL, Mar 28 '20 at 04:49
@BPL Unfortunately, I can't release any of the source code yet because it's for a school project and they've got pretty strict guidelines on plagiarism, so it's currently in a private repo but I will make it available when the embargo has been lifted. As far as the development goes, I think transfer learning is really the best route for situations when there is limited VRAM or computational power as the model seems to be able to get momentum easier and converge towards a reasonable solution on less than ideal hardware. — Josh Newham, Mar 28 '20 at 07:07
@BPL I basically ended up using a tweaked version of Mukul's suggestion and, whilst it wasn't amazingly accurate, it did a reasonable job at predicting "bad" images, considering it's such a subjective thing. This then went into a Node JS web server and was accompanied with a nice image gallery interface... — Josh Newham, Mar 28 '20 at 07:10
That is what the concept of making an [mre] is for. It allows to discuss a specific problem, exhibited by both, the original and the MRE, without exposing anything. Please study the info given via the link. — Yunnosch, Mar 28 '20 at 07:10
@JoshNewham That's sad but understandable (I guess :P), I've got some good use-cases for a tool such as this. For instance, let's say I've got a procedural image generator that generates thousands of images per minute (let's say images with small resolution), I'd love to be able to recognize the score of these images with such a tool. Or even better, using a tool like this playing along together with a GAN architecture would be pretty interesting as well. Do you know any github project related to this area of ML? — BPL, Mar 28 '20 at 10:49
It seems few github projects are built on top of the well-known [NIMA](https://arxiv.org/pdf/1709.05424.pdf) paper — BPL, Mar 28 '20 at 11:03

desertnaut · Answer 1 · 2019-09-15T12:53:27.237

Independently of any other advice (including the answer already provided), and assuming classes=2 (which you don't clarify - there is a reason we ask for a MCVE here), you seem to perform a fundamental mistake in your final layer, i.e.:

# sigmoid classifier (output layer)
model.add(Dense(classes))
model.add(Activation("sigmoid"))

A sigmoid activation is suitable only if your final layer consists of a single node; if classes=2, as I suspect, based also on your puzzling statement in the comments that

with three different images, my results are 0.987 bad and 0.999 good

and

I was giving you the predictions from the model previously

you should use a softmax activation, i.e.

model.add(Dense(classes))
model.add(Activation("softmax"))

Alternatively, you could use sigmoid, but your final layer should consist of a single node, i.e.

model.add(Dense(1))
model.add(Activation("sigmoid"))

The latter is usually preferred in binary classification settings, but the results should be the same in principle.

UPDATE (after updating the question):

sparse_categorical_crossentropy is not the correct loss here, either.

All in all, try the following changes:

model.compile(loss="binary_crossentropy", optimizer=Adam(), metrics=["accuracy"])

# final layer:
model.add(Dense(1))
model.add(Activation("sigmoid"))

with Adam optimizer (needs import). Also, dropout should not be used by default - see this thread; start without it and only add if necessary (i.e. if you see signs of overfitting).

Thanks, I believe I have tried it with both softmax and sigmoid activation functions just to see if there was any difference but sigmoid is definitely wrong. Softmax activation still led to overfitting and bias but is much better suited for binary classification. To clarify some parameters, the input size is (320, 240, 3) and there are two possible classes (good and bad). — Josh Newham, Sep 15 '19 at 11:36
@JoshNewham There is nothing you have posted that leads to an overfitting diagnosis, which (overfitting) means *validation accuracy starts decreasing while training accuracy still increasing*. Again, please try to avoid verbal statements (bias?) and use values instead. Also, pls post your `model.compile()` (not here, edit & update your post). — desertnaut, Sep 15 '19 at 11:42
I’ve edited the original post to include that information. — Josh Newham, Sep 15 '19 at 12:21

score 3 · Accepted Answer · edited Sep 15 '19 at 11:16

3

I suggest you go for transfer learning instead of training the whole network. use the weights trained on a huge Dataset like ImageNet

you can easily do this using Keras you just need to import model with weights like xception and remove last layer which represents 1000 classes of imagenet dataset to 2 node dense layer cause you have only 2 classes and set trainable=False for the base layer and trainable=True for custom added layers like dense layer having node = 2.

and you can train the model as usual way.

Demo code -

from keras.applications import *
from keras.models import Model

base_model = Xception(input_shape=(img_width, img_height, 3), weights='imagenet', include_top=False
x = base_model.output
x = GlobalAveragePooling2D()(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(base_model.input, predictions)
# freezing the base layer weights
for layer in base_model.layers:
    layer.trainable = False

edited Sep 15 '19 at 11:16

desertnaut

57,590
26
140
166

answered Sep 15 '19 at 11:13

Mukul

860
8
19

1

Great idea. Didn’t even cross my mind to get pretrained weights and then train my model from there. I’ve implemented something with your demo code and, after a very short training period, it seems to be much better than my original model (its predictions are different based on the input). I’ve set it off training for longer to see if I can get the accuracy up... – Josh Newham Sep 15 '19 at 12:25
I am glad to hear that, please mark my answer as accepted solution, if it is solved your problem completely. – Mukul Sep 15 '19 at 14:32
This is a really interesting thread... as a total newbie in machine learning, is there any open source software i could use to score a dataset of artist generated images? – BPL Mar 28 '20 at 04:48

CNN architecture: classifying "good" and "bad" images

2 Answers2