I'm new to Keras and am doing a basic Kaggle tutorial (The Digit Recognizer). I am struggling to understand what to actually put into a Dense
layer. I have found this post to be very helpful, but my understanding isn't quite there yet.
In my Sequential
model, I'm starting off with a Dense
layer. But, I see some posts saying that the first layer must have an input_shape
whereas I see plenty of Kaggle submissions and other examples that don't adhere to this.
- Is an
input_shape
actually required in the first layer? Is it required at all? - A
Dense
layers first argument isunits
. For the life of me, I cannot find a solid explanation on what this argument should actually be. Is there some formula to run here based on theshape
of your data? Sometimes I see rather large numbers (something like 784) in the firstDense
layer whereas in other cases it's small (something like 10). Or, is it a total guess?
I understand that there isn't a "this is what you do for this type of data" approach to building a predictive model, but I can't understand how to even take an educated guess at what numbers to plug in here.
Here is my current model:
model = Sequential()
model.add(Flatten())
model.add(Dense(64, activation=tf.nn.relu))
model.add(Dense(10, activation=tf.nn.softmax))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
x_val = x_train[:10000]
y_val = labels[:10000]
model.fit(x_train, y_train, epochs=3) # , validation_data=(x_val, y_val))
My model performs decent (I think) as I only have about 350 misses out of 8400 images. I've got it down to about 220 with adding more layers, changing numbers, using Dropout
, etc...
I'd really like to get a better understanding about the best way to understand what numbers I should be plugging in. And also, do I need an input_shape
?