I have designed a simple ConvNet with keras which will process some images into two classes. The images are not real-world photos but they are signals converted to images. The model is trained well and results in %99 training-acc and %90 test-acc.
The model summary is as follows:
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_5 (Conv2D) (None, 64, 64, 32) 832
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 32, 32, 32) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 32, 32, 32) 25632
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 16, 16, 32) 0
_________________________________________________________________
conv2d_7 (Conv2D) (None, 16, 16, 64) 51264
_________________________________________________________________
max_pooling2d_6 (MaxPooling2 (None, 8, 8, 64) 0
_________________________________________________________________
conv2d_8 (Conv2D) (None, 5, 5, 64) 65600
_________________________________________________________________
flatten_2 (Flatten) (None, 1600) 0
_________________________________________________________________
dense_12 (Dense) (None, 256) 409856
_________________________________________________________________
dense_13 (Dense) (None, 32) 8224
_________________________________________________________________
dense_14 (Dense) (None, 2) 66
=================================================================
Total params: 561,474
Trainable params: 561,474
Non-trainable params: 0
_________________________________________________________________
The problem is, when I print out the output of data for hidden dense_12 and dense_13 layers, there are lots of dimensions with zero variance for the data. This -in my opinion- means that those outputs have no effect on the decision and in other words useless features that the network ignores them. The sample code for obtaining these hidden layer outputs is as follows:
dense_2_output = backend.function([model.input],[model.layers[-2].output])
dense_1_output = backend.function([model.input],[model.layers[-3].output])
train_weights_2 = dense_2_output([grid_train_image])[0]
train_weights_1 = dense_1_output([grid_train_image])[0]
x_2 = np.var(train_weights_2,0)
x_1 = np.var(train_weights_1,0)
Thinking of this model as a kind of classic SVM, the features with zero variance are useless and must be omitted. Does anyone have any idea how to omit these useless features to have a fully functional feature space?
Note: I also tried decreasing dimensions of the network dense layers from (256,32) to (180,20) and retrained the system. But there are still lots of zero variance features and the performance of the system slightly reduced. If you can guide me to read more about this or know the answer, I appreciate any help.