What a Flatten layer does
After convolutional operations, tf.keras.layers.Flatten
will reshape a tensor into (n_samples, height*width*channels)
, for example turning (16, 28, 28, 3)
into (16, 2352)
. Let's try it:
import tensorflow as tf
x = tf.random.uniform(shape=(100, 28, 28, 3), minval=0, maxval=256, dtype=tf.int32)
flat = tf.keras.layers.Flatten()
flat(x).shape
TensorShape([100, 2352])
What a GlobalAveragePooling layer does
After convolutional operations, tf.keras.layers.GlobalAveragePooling
layer does is average all the values according to the last axis. This means that the resulting shape will be (n_samples, last_axis)
. For instance, if your last convolutional layer had 64 filters, it would turn (16, 7, 7, 64)
into (16, 64)
. Let's make the test, after a few convolutional operations:
import tensorflow as tf
x = tf.cast(
tf.random.uniform(shape=(16, 28, 28, 3), minval=0, maxval=256, dtype=tf.int32),
tf.float32)
gap = tf.keras.layers.GlobalAveragePooling2D()
for i in range(5):
conv = tf.keras.layers.Conv2D(64, 3)
x = conv(x)
print(x.shape)
print(gap(x).shape)
(16, 24, 24, 64)
(16, 22, 22, 64)
(16, 20, 20, 64)
(16, 18, 18, 64)
(16, 16, 16, 64)
(16, 64)
Which should you use?
The Flatten
layer will always have at least as much parameters as the GlobalAveragePooling2D
layer. If the final tensor shape before flattening is still large, for instance (16, 240, 240, 128)
, using Flatten
will make an insane amount of parameters: 240*240*128 = 7,372,800
. This huge number will be multiplied by the number of units in your next dense layer! At that moment, GlobalAveragePooling2D
might be preferred in most cases. If you used MaxPooling2D
and Conv2D
so much that your tensor shape before flattening is like (16, 1, 1, 128)
, it won't make a difference. If you're overfitting, you might want to try GlobalAveragePooling2D
.