I'm experimenting with CNNs and I'm baffled, because model I've built actually learns slower and performs worse than fully connected NN. Here are two models:
fully connected:
hidden1 = tf.layers.dense(X, 2000, name="hidden1",
activation=tf.nn.relu)
hidden2 = tf.layers.dense(hidden1, 1000, name="hidden2",
activation=tf.nn.relu)
hidden3 = tf.layers.dense(hidden2, 1000, name="hidden3",
activation=tf.nn.relu)
hidden4 = tf.layers.dense(hidden3, 1000, name="hidden4",
activation=tf.nn.relu)
hidden5 = tf.layers.dense(hidden4, 700, name="hidden5",
activation=tf.nn.relu)
hidden6 = tf.layers.dense(hidden5, 500, name="hidden6",
activation=tf.nn.relu)
logits = tf.layers.dense(hidden6, 2, name="outputs")
CNN:
f = tf.get_variable('conv1-fil', [5,5,1,10])
conv1 = tf.nn.conv2d(X, filter=f, strides=[1, 1, 1, 1], padding="SAME")
pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")
f2 = tf.get_variable('conv2-fil', [3,3,10,7])
conv2 = tf.nn.conv2d(pool1, filter=f2, strides=[1, 1, 1, 1], padding="SAME")
pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")
fc1 = tf.contrib.layers.flatten(pool2)
hidden1 = tf.layers.dense(fc1, 3630, name="hidden1",
activation=tf.nn.relu)
hidden2 = tf.layers.dense(hidden1, 2000, name="hidden2",
activation=tf.nn.relu)
hidden3 = tf.layers.dense(hidden2, 1000, name="hidden3",
activation=tf.nn.relu)
hidden5 = tf.layers.dense(hidden3, 700, name="hidden5",
activation=tf.nn.relu)
hidden6 = tf.layers.dense(hidden5, 500, name="hidden6",
activation=tf.nn.relu)
logits = tf.layers.dense(hidden6, 2, name="outputs")
Basically CNN have a little more shallow fully connected net, but added conv layers vs just fully connected. CNN arrives to accuracy ~88% vs 92% of deep nn after same number of epochs and same dataset. How to debug issues like that? What are good practices in designing conv layers?