0

I'm new to Keras and am doing a basic Kaggle tutorial (The Digit Recognizer). I am struggling to understand what to actually put into a Dense layer. I have found this post to be very helpful, but my understanding isn't quite there yet.

In my Sequential model, I'm starting off with a Dense layer. But, I see some posts saying that the first layer must have an input_shape whereas I see plenty of Kaggle submissions and other examples that don't adhere to this.

  1. Is an input_shape actually required in the first layer? Is it required at all?
  2. A Dense layers first argument is units. For the life of me, I cannot find a solid explanation on what this argument should actually be. Is there some formula to run here based on the shape of your data? Sometimes I see rather large numbers (something like 784) in the first Dense layer whereas in other cases it's small (something like 10). Or, is it a total guess?

I understand that there isn't a "this is what you do for this type of data" approach to building a predictive model, but I can't understand how to even take an educated guess at what numbers to plug in here.

Here is my current model:

model = Sequential()
model.add(Flatten())
model.add(Dense(64, activation=tf.nn.relu))
model.add(Dense(10, activation=tf.nn.softmax))
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
x_val = x_train[:10000]
y_val = labels[:10000]
model.fit(x_train, y_train, epochs=3) # , validation_data=(x_val, y_val))

My model performs decent (I think) as I only have about 350 misses out of 8400 images. I've got it down to about 220 with adding more layers, changing numbers, using Dropout, etc...

I'd really like to get a better understanding about the best way to understand what numbers I should be plugging in. And also, do I need an input_shape?

mwilson
  • 12,295
  • 7
  • 55
  • 95

1 Answers1

1

Your first question:
Yes, input_shape must be specified in the first layer, which is ,in your case, the Flatten layer. It is because different #param (weights & bias) needed to be initialised after compiling the model. In you case, the #pram of your first Dense layer will depend on the input_shape you specified in Flatten layer.

Second question:
If the Desnse layer is the last layer of your model, the unit should clearly be #classes / #outputs you want the model to perform. However, when it comes to hidden layers, as far as I know, there is no such a universal rule/formula to guarantee an optimal number of unit. It really depends on the data you feed into the model & the complexity of the task & etc... I would say it should be chosen on trial and error basis.

Edit:
Here I found some info for your second question if you really cannot be satisfied by my answer.
EfficientNet paper: but it is for Convolutional Neural Networks
"Rlue of thumb": ...

meowongac
  • 702
  • 3
  • 12