0

I am trying to understand how the 1D convolutional layer works.

Let's prepare the data

from sklearn.model_selection import train_test_split
import keras

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv', 
                 names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'], header = 1)

df['labels'] = df['species'].astype('category').cat.codes

X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
ylab = df['labels']
x_train, x_test, y_train, y_test = train_test_split(np.asarray(X), np.asarray(ylab), test_size=0.33, shuffle= True)

# The known number of output classes.
num_classes = 3

# Input image dimensions
input_shape = (4,)

# Convert class vectors to binary class matrices. This uses 1 hot encoding.
y_train_binary = keras.utils.to_categorical(y_train, num_classes)
y_test_binary = keras.utils.to_categorical(y_test, num_classes)

x_train = x_train[0:100,:].reshape(99, 4,1)
x_test = x_test[0:50,:].reshape(50, 4,1)

Here is the Keras model,

from __future__ import print_function    
from keras.models import Sequential
import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv1D
from keras.callbacks import ModelCheckpoint
from keras.models import model_from_json
from keras import backend as K

model = Sequential()
model.add(Conv1D(10, (4), input_shape=(4,1), activation='relu')) # filter size 4 is the max number of filters - becaus every feature is convoluted 
model.add(Flatten())
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])
model.summary()

My data is 4 dimension. Therefore, seems the filter size of the CNN can not be larger than 4. So when I set it to 4 means all the 4 features are included to be conventionalized. I am curious what does filter the first argument in the Conv1D means.

Based on the documentation

filters: Integer, the dimensional of the output space (i.e. the number of output filters in the convolution). 

So I am curious how out of four features, 10 can be generated ?! ( may be I am missing something ) but I appreciate preferably a visualized solution, or a formula to shows how can derive the output. I have noticed, but increasing the filter the performance gets better (1 vs 10). So ideally would like to understand how does that influence on the performance.

Areza
  • 5,623
  • 7
  • 48
  • 79

1 Answers1

1

Assuming doing Valid padding and performing convolution on a 4-unit long input with a 4-unit wide filter, here's how it works. Valid padding would result in a single output in this case.

So as you can see you can generate any arbitrary number of outputs. All you need is more kernels (or filters). These filters are essentially a tensorflow variable. This is what filter_size in the layer means. It has nothing to do with the width of the filter which is set in the second argument in your example.

enter image description here

More information on how 1D/2D/3D convolution works: Here

How the performance is affected

To say the least, filters learn feature representations of the input. The more representations you have the better the performance will be (generally not always). Having more filters than needed can also lead to overfitting. So you need to strike that balance using hyperparameter optimization.

Community
  • 1
  • 1
thushv89
  • 10,865
  • 1
  • 26
  • 39