What is the difference between conv1d with kernel_size=1 and dense layer?

Question

I am building a CNN with Conv1D layers, and it trains pretty well. I'm now looking into how to reduce the number of features before feeding it into a Dense layer at the end of the model, so I've been reducing the size of the Dense layer, but then I came across this article. The article talks about the effect of using a Conv2D filters with a kernel_size=(1,1) to reduce the number of features.

I was wondering what the difference is between using a Conv2D layer with kernel_size=(1,1) tf.keras.layers.Conv2D(filters=n,kernel_size=(1,1)) and using a Dense layer of the same size tf.keras.layers.Dense(units=n)? From my perspective (I'm relatively new to neural nets), a filter with kernel_size=(1,1) is a single number, which is essentially equivalent to weight in a Dense layer, and both layers have biases, so are they equivalent, or am I misunderstanding something? And if my understanding is correct, in my case where I am using Conv1D layers, not Conv2D layers, does that change anything? As in is tf.keras.layers.Conv1D(filters=n, kernel_size=1) equivalent to tf.keras.layers.Dense(units=n)?

Please let me know if you need anything from me to clarify the question. I'm mostly curious about if Conv1D layers with kernel_size=1 and Conv2D layers with kernel_size=(1,1) behave differently than Dense layers.

score 7 · Accepted Answer · answered Aug 16 '19 at 15:42

7

Yes, since Dense layer is applied on the last dimension of its input (see this answer), Dense(units=N) and Conv1D(filters=N, kernel_size=1) (or Dense(units=N) and Conv2D(filters=N, kernel_size=1)) are basically equivalent to each other both in terms of connections and number of trainable parameters.

answered Aug 16 '19 at 15:42

today

32,602
8
95
115

1

The elitist and technically reasonable explanation I learned to this question is that a purely CNN had no MLPs, instead using kernel size 1 to achieve similar function. – TheLoneDeranger Aug 18 '19 at 08:54
@TheLoneDeranger to clarify, when you say MLPs, I think you are referencing Dense layers, is that correct? – mickey Feb 03 '20 at 18:27
Precisely. Of course, in the vast majority of practical circumstances, there's no need to be pedantic about it; might as well use dense layers if you can. I can imagine some hardware neural networks perhaps optimized by 'pure' CNNs, but otherwise... :) – TheLoneDeranger Feb 07 '20 at 04:19

score 1 · Answer 2 · answered Aug 31 '20 at 21:53

In 1D CNN, the kernel moves in 1 direction. The input and output data of 1D CNN is 2 dimensional. Mostly used on Time-Series Data, Natural Language Processing tasks etc. Definitely gonna see people using it in Kaggle NLP competitions and notebooks.

In 2D CNN, the kernel moves in 2 directions. The input and output data of 2D CNN is 3 dimensional. Mostly used on Image data. Definitely gonna see people using it in Kaggle CNN Image Processing competitions and notebooks

In 3D CNN, the kernel moves in 3 directions. The input and output data of 3D CNN is 4 dimensional. Mostly used on 3D Image data (MRI, CT Scans). Haven't personally seen applied version in competitions

What is the difference between conv1d with kernel_size=1 and dense layer?

2 Answers2