I am trying to understand how the 1D convolutional layer works.
Let's prepare the data
from sklearn.model_selection import train_test_split
import keras
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv',
names=['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species'], header = 1)
df['labels'] = df['species'].astype('category').cat.codes
X = df[['sepal_length', 'sepal_width', 'petal_length', 'petal_width']]
ylab = df['labels']
x_train, x_test, y_train, y_test = train_test_split(np.asarray(X), np.asarray(ylab), test_size=0.33, shuffle= True)
# The known number of output classes.
num_classes = 3
# Input image dimensions
input_shape = (4,)
# Convert class vectors to binary class matrices. This uses 1 hot encoding.
y_train_binary = keras.utils.to_categorical(y_train, num_classes)
y_test_binary = keras.utils.to_categorical(y_test, num_classes)
x_train = x_train[0:100,:].reshape(99, 4,1)
x_test = x_test[0:50,:].reshape(50, 4,1)
Here is the Keras model,
from __future__ import print_function
from keras.models import Sequential
import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv1D
from keras.callbacks import ModelCheckpoint
from keras.models import model_from_json
from keras import backend as K
model = Sequential()
model.add(Conv1D(10, (4), input_shape=(4,1), activation='relu')) # filter size 4 is the max number of filters - becaus every feature is convoluted
model.add(Flatten())
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
model.summary()
My data is 4 dimension. Therefore, seems the filter size of the CNN can not be larger than 4. So when I set it to 4 means all the 4 features are included to be conventionalized. I am curious what does filter the first argument in the Conv1D
means.
Based on the documentation
filters: Integer, the dimensional of the output space (i.e. the number of output filters in the convolution).
So I am curious how out of four features, 10 can be generated ?! ( may be I am missing something ) but I appreciate preferably a visualized solution, or a formula to shows how can derive the output. I have noticed, but increasing the filter the performance gets better (1 vs 10). So ideally would like to understand how does that influence on the performance.