2

I was working through the Keras implementation of Age and Gender Detection model described in the research paper Age and Gender Classification using Convolutional Neural Networks'. It was originally a Caffe model but I thought to convert it to Keras. But while I was training the model, the accuracy of the model got stuck around 49 - 52%. It means that the model is not learning at all. Also, the loss can be seen exponentially increasing and at times becomes nan. I was training on google collab with GPU hardware accelerator.

My input was a folder of images whose labels are in its file name.I loaded all the images as a numpy array and labels were a collection of 10 elements (2 for gender and 8 classes for 8 different age groups as described in the paper).

model = Sequential()
model.add(Conv2D(96,(7,7),
                 activation= 'relu',
                 strides= 4,
                 use_bias= 1,
                 bias_initializer= 'Zeros',
                 data_format= 'channels_last',
                 kernel_initializer = RandomNormal(stddev= 0.01),
                 input_shape= (200,200,3)))
model.add(MaxPooling2D(pool_size= 3,
                       strides= 2))
model.add(BatchNormalization())

model.add(Conv2D(256,(5,5),
                 activation= 'relu',
                 strides= 1,
                 use_bias= 1,
                 data_format= 'channels_last',
                 bias_initializer= 'Ones',
                 kernel_initializer = RandomNormal(stddev= 0.01)
                 ))
model.add(MaxPooling2D(pool_size= 3,
                       strides= 2))
model.add(BatchNormalization())

model.add(Conv2D(384,
                 (3,3),
                 strides= 1,
                 data_format= 'channels_last',
                 use_bias= 1,
                 bias_initializer= 'Zeros',
                 padding= 'same',
                 kernel_initializer = RandomNormal(stddev= 0.01),
                 activation= 'relu'))
model.add(MaxPooling2D(pool_size= 3,
                       strides= 2))

model.add(Flatten())
model.add(Dense(512,
                use_bias= 1,
                bias_initializer= 'Ones',
                kernel_initializer= RandomNormal(stddev= 0.05),
                activation= 'relu'))
model.add(Dropout(0.5))

model.add(Dense(512,
                use_bias= 1,
                bias_initializer= 'Ones',
                kernel_initializer= RandomNormal(stddev= 0.05),
                activation= 'relu'))
model.add(Dropout(0.5))

model.add(Dense(10,
                use_bias= 1,
                kernel_initializer= RandomNormal(stddev= 0.01),
                bias_initializer= 'Zeros',
                activation= 'softmax'))

model.compile(loss= 'categorical_crossentropy', metrics= ['accuracy'], optimizer= SGD(lr= 0.0001, decay= 1e-7, nesterov= False))
model.summary()

Inputs to the model were shuffled:

X_train, X_test, y_train, y_test = train_test_split(images,labels,test_size= 0.2,shuffle= True, random_state= 42)

You can see my training results here I have used correct optimizers and correct initializers along with biases to prevent vanishing gradients.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 1
    you have taken care of everything but failed to notice that the `loss` had become `nan` from second epoch itself. Solving this might also help you get better accuracy. – learner Apr 26 '20 at 11:54
  • 0 I was gonna suggest the same. Keep track of your loss and the reasons why you get nan on epochs 2+. This thread might be a good place to start searching: https://stackoverflow.com/questions/61416197/pretraining-a-language-model-on-a-small-custom-corpus – inverted_index Apr 26 '20 at 19:50
  • Is your target one-hot encoded? Can you show all your 10 labels. –  Apr 27 '20 at 11:23
  • My labels is of the format : y = ['Male','Female','0 – 2', '4 – 6', '8 – 12', '15 – 20', '25 – 32', '38 – 43','48 – 53', '60 – 100']. It is in the form 0/1. – Aditya Gupta Apr 28 '20 at 06:15
  • I tried using Adam optimizer and using tanh activation but no prgress. I can't figure why my loss is nan. – Aditya Gupta Apr 28 '20 at 06:31
  • However, when I trained a multi-output network, One output layer for age and another for gender, then the gender output was learning and it was fine but the age validation accuracy is stuck at 50 %. – Aditya Gupta Apr 28 '20 at 06:35

1 Answers1

1

Would suggest to follow the below approach to improve the accuracy of the model -

  • Build two different models, one for the Gender prediction and another for the Age prediction.
  • Use Label Encoder or One hot encoder on the target variables.
  • For Gender Predcition model use Binary crossentrpy as loss function.
  • For Age prediction model use Categorical crossentropy(if you have used Label Encoder for target variable) or sparse categorical crossentropy(if you used one hot encoder for target variable).
  • Before building the model normalize all the numerical data.
  • Use softmax in the final layer as activation function and relu in remaining layers.
  • Also instead of 2 hidden dense layers, keep just 1(more dense layer means more weight to learn, you can experiment with the number of layers and filters).

Hope I have answered your question. Happy Learning!

  • @Aditya Gupta - Can you please accept and upvote the answer if it answers your question. Thank You. –  May 07 '20 at 18:59