My model is a CNN based one with multiple BN layers and DO layers. So originally, I accidentally put model.train() outside of the loop just like the following:
model.train()
for e in range(num_epochs):
# train model
model.eval()
# eval model
For the record, the code above trained well and performed decently on validation set:
[CV:02][E:001][I:320/320] avg. Loss: 0.460897, avg. Acc: 0.742746, test. acc: 0.708046(max: 0.708046)
[CV:02][E:002][I:320/320] avg. Loss: 0.389883, avg. Acc: 0.798791, test. acc: 0.823563(max: 0.823563)
[CV:02][E:003][I:320/320] avg. Loss: 0.319034, avg. Acc: 0.825559, test. acc: 0.834914(max: 0.834914)
[CV:02][E:004][I:320/320] avg. Loss: 0.301322, avg. Acc: 0.834254, test. acc: 0.834052(max: 0.834914)
[CV:02][E:005][I:320/320] avg. Loss: 0.292184, avg. Acc: 0.839575, test. acc: 0.835201(max: 0.835201)
[CV:02][E:006][I:320/320] avg. Loss: 0.285467, avg. Acc: 0.842266, test. acc: 0.837931(max: 0.837931)
[CV:02][E:007][I:320/320] avg. Loss: 0.279607, avg. Acc: 0.844917, test. acc: 0.829885(max: 0.837931)
[CV:02][E:008][I:320/320] avg. Loss: 0.275252, avg. Acc: 0.846443, test. acc: 0.827874(max: 0.837931)
[CV:02][E:009][I:320/320] avg. Loss: 0.270719, avg. Acc: 0.848150, test. acc: 0.822989(max: 0.837931)
While reviewing the code, however, I realized I made a mistake because the code above would turn off BN layers and DO layers after the first iteration.
So, I moved the line: model.train() inside the loop:
for e in range(num_epochs):
model.train()
#train model
model.eval()
#eval model
At this point, the model was learning relatively poorly(seemed like the model overfit as you can see in the following output). It had higher training accuracy, but significantly lower accuracy on validation set(it was starting to get weird at this point considering the usual effect of BN and DO):
[CV:02][E:001][I:320/320] avg. Loss: 0.416946, avg. Acc: 0.750477, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.329121, avg. Acc: 0.798992, test. acc: 0.690948(max: 0.690948)
[CV:02][E:003][I:320/320] avg. Loss: 0.305688, avg. Acc: 0.829053, test. acc: 0.719540(max: 0.719540)
[CV:02][E:004][I:320/320] avg. Loss: 0.290048, avg. Acc: 0.840539, test. acc: 0.741954(max: 0.741954)
[CV:02][E:005][I:320/320] avg. Loss: 0.279873, avg. Acc: 0.848872, test. acc: 0.745833(max: 0.745833)
[CV:02][E:006][I:320/320] avg. Loss: 0.270934, avg. Acc: 0.854274, test. acc: 0.742960(max: 0.745833)
[CV:02][E:007][I:320/320] avg. Loss: 0.263515, avg. Acc: 0.856945, test. acc: 0.741667(max: 0.745833)
[CV:02][E:008][I:320/320] avg. Loss: 0.256854, avg. Acc: 0.858672, test. acc: 0.734483(max: 0.745833)
[CV:02][E:009][I:320/320] avg. Loss: 0.252013, avg. Acc: 0.861363, test. acc: 0.723707(max: 0.745833)
[CV:02][E:010][I:320/320] avg. Loss: 0.245525, avg. Acc: 0.865519, test. acc: 0.711494(max: 0.745833)
So I thought to myself: "I guess BN layers and DO layers are having negative effects in my model" and hence, removed them. However, the model didn't perform well with BN and DO layers removed either(In fact, the model didn't seem to be learning anything):
[CV:02][E:001][I:320/320] avg. Loss: 0.552687, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.506234, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:003][I:320/320] avg. Loss: 0.503373, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:004][I:320/320] avg. Loss: 0.502966, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:005][I:320/320] avg. Loss: 0.502870, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:006][I:320/320] avg. Loss: 0.502832, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:007][I:320/320] avg. Loss: 0.502800, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:008][I:320/320] avg. Loss: 0.502765, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
I was very confused at this point. I went further and carried out another experiment. I placed BN layers and DO layers back into the model and tested the following:
for e in range(num_epochs):
model.eval()
# train model
# eval model
It worked poorly:
[CV:02][E:001][I:320/320] avg. Loss: 0.562196, avg. Acc: 0.744774, test. acc: 0.689080(max: 0.689080)
[CV:02][E:002][I:320/320] avg. Loss: 0.506071, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:003][I:320/320] avg. Loss: 0.503234, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:004][I:320/320] avg. Loss: 0.502916, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:005][I:320/320] avg. Loss: 0.502859, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
[CV:02][E:006][I:320/320] avg. Loss: 0.502838, avg. Acc: 0.749071, test. acc: 0.689080(max: 0.689080)
I performed the above experiments multiple times and the results were not far off the outputs I have posted above. (Data I'm working with is pretty simple).
To summarize, the model works best in a very special setting.
- Batch Normalization and Dropout added to the model. (This is good).
- Train the model with model.train() on only for the first epoch. (Weird.. combined with 3)
- Train the model with model.eval() on for the rest of the iterations. (Weird as well)
To be honest, I wouldn't have set up the training procedure as above(I don't think anyone would), but it works well for some reason. Has anybody experienced anything similar? or if you could guide me as to why the model behaves this way, it would be much appreciated!!
Thanks in advance!!