0

I'm experimenting with CNNs and I'm baffled, because model I've built actually learns slower and performs worse than fully connected NN. Here are two models:

fully connected:

hidden1 = tf.layers.dense(X, 2000, name="hidden1",                          
                          activation=tf.nn.relu)                            
hidden2 = tf.layers.dense(hidden1, 1000, name="hidden2",                    
                          activation=tf.nn.relu)                            
hidden3 = tf.layers.dense(hidden2, 1000, name="hidden3",                    
                          activation=tf.nn.relu)                            
hidden4 = tf.layers.dense(hidden3, 1000, name="hidden4",                    
                          activation=tf.nn.relu)                            
hidden5 = tf.layers.dense(hidden4, 700, name="hidden5",                     
                          activation=tf.nn.relu)                            
hidden6 = tf.layers.dense(hidden5, 500, name="hidden6",                     
                          activation=tf.nn.relu)                            
logits = tf.layers.dense(hidden6, 2, name="outputs")

CNN:

f = tf.get_variable('conv1-fil', [5,5,1,10])                                
conv1 = tf.nn.conv2d(X, filter=f, strides=[1, 1, 1, 1], padding="SAME")        
pool1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")
f2 = tf.get_variable('conv2-fil', [3,3,10,7])                               
conv2 = tf.nn.conv2d(pool1, filter=f2, strides=[1, 1, 1, 1], padding="SAME")
pool2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="VALID")
fc1 = tf.contrib.layers.flatten(pool2)
hidden1 = tf.layers.dense(fc1, 3630, name="hidden1",                        
                          activation=tf.nn.relu)                            
hidden2 = tf.layers.dense(hidden1, 2000, name="hidden2",                    
                          activation=tf.nn.relu)                            
hidden3 = tf.layers.dense(hidden2, 1000, name="hidden3",                    
                          activation=tf.nn.relu)                            
hidden5 = tf.layers.dense(hidden3, 700, name="hidden5",                     
                          activation=tf.nn.relu)                            
hidden6 = tf.layers.dense(hidden5, 500, name="hidden6",                     
                          activation=tf.nn.relu)                            
logits = tf.layers.dense(hidden6, 2, name="outputs")

Basically CNN have a little more shallow fully connected net, but added conv layers vs just fully connected. CNN arrives to accuracy ~88% vs 92% of deep nn after same number of epochs and same dataset. How to debug issues like that? What are good practices in designing conv layers?

inc0
  • 215
  • 2
  • 8
  • 1
    Just to make sure, you compare validation set accuracies, not train set accuracies ? Which of these networks has a higher number of trainable parameters (see e.g. https://stackoverflow.com/questions/38160940) ? – Andre Holzner Jan 07 '18 at 10:09
  • 1
    'Debug' is not the appropriate term here. And why should a CNN give better (or equal) performance? Can you provide some reference that makes you believe so? – desertnaut Jan 07 '18 at 10:38
  • Yes, it was all cross validated with test set not used for training, acc for both training and test are very close to each other so I don't think it's overfitting. I've noticed that CNN has almost 10mil less trainable vars. I'll play around this and see how it affects overall accuracy (funny because it's much slower than dnn). Thank you. As for CNN being better than fully connected, it is my understanding that CNN are generally what's used right not for image prediction. I understand ofc it's not silver bullet which will automatically work, but that's why I'm asking... – inc0 Jan 07 '18 at 20:33
  • what's the classification task you're trying ? (what are your images representing ?) How many do you have ? How many classes ? – Soltius Dec 17 '18 at 10:26

0 Answers0