7

I am building a CNN with Conv1D layers, and it trains pretty well. I'm now looking into how to reduce the number of features before feeding it into a Dense layer at the end of the model, so I've been reducing the size of the Dense layer, but then I came across this article. The article talks about the effect of using a Conv2D filters with a kernel_size=(1,1) to reduce the number of features.

I was wondering what the difference is between using a Conv2D layer with kernel_size=(1,1) tf.keras.layers.Conv2D(filters=n,kernel_size=(1,1)) and using a Dense layer of the same size tf.keras.layers.Dense(units=n)? From my perspective (I'm relatively new to neural nets), a filter with kernel_size=(1,1) is a single number, which is essentially equivalent to weight in a Dense layer, and both layers have biases, so are they equivalent, or am I misunderstanding something? And if my understanding is correct, in my case where I am using Conv1D layers, not Conv2D layers, does that change anything? As in is tf.keras.layers.Conv1D(filters=n, kernel_size=1) equivalent to tf.keras.layers.Dense(units=n)?

Please let me know if you need anything from me to clarify the question. I'm mostly curious about if Conv1D layers with kernel_size=1 and Conv2D layers with kernel_size=(1,1) behave differently than Dense layers.

mickey
  • 476
  • 5
  • 18

2 Answers2

7

Yes, since Dense layer is applied on the last dimension of its input (see this answer), Dense(units=N) and Conv1D(filters=N, kernel_size=1) (or Dense(units=N) and Conv2D(filters=N, kernel_size=1)) are basically equivalent to each other both in terms of connections and number of trainable parameters.

today
  • 32,602
  • 8
  • 95
  • 115
  • 1
    The elitist and technically reasonable explanation I learned to this question is that a purely CNN had no MLPs, instead using kernel size 1 to achieve similar function. – TheLoneDeranger Aug 18 '19 at 08:54
  • @TheLoneDeranger to clarify, when you say MLPs, I think you are referencing Dense layers, is that correct? – mickey Feb 03 '20 at 18:27
  • Precisely. Of course, in the vast majority of practical circumstances, there's no need to be pedantic about it; might as well use dense layers if you can. I can imagine some hardware neural networks perhaps optimized by 'pure' CNNs, but otherwise... :) – TheLoneDeranger Feb 07 '20 at 04:19
1

In 1D CNN, the kernel moves in 1 direction. The input and output data of 1D CNN is 2 dimensional. Mostly used on Time-Series Data, Natural Language Processing tasks etc. Definitely gonna see people using it in Kaggle NLP competitions and notebooks.

In 2D CNN, the kernel moves in 2 directions. The input and output data of 2D CNN is 3 dimensional. Mostly used on Image data. Definitely gonna see people using it in Kaggle CNN Image Processing competitions and notebooks

In 3D CNN, the kernel moves in 3 directions. The input and output data of 3D CNN is 4 dimensional. Mostly used on 3D Image data (MRI, CT Scans). Haven't personally seen applied version in competitions