I set up a model with Keras, then I trained it on a dataset of 3 records and finally I tested the resulting model with evaluate() and predict(), using the same test set for both functions (the test set has 100 records and it doesn't have any record of the training set, as much as it can be relevant, given the size of the two datasets). The dataset is composed by 5 files, where 4 files represent each one a different temperature sensor, that each minute collects 60 measurements (each row contains 60 measurements), while the last file contains the class labels that I want to predict (in particular, 3 classes: 3, 20 or 100).
This is the model I'm using:
n_sensors, t_periods = 4, 60
model = Sequential()
model.add(Conv1D(100, 6, activation='relu', input_shape=(t_periods, n_sensors)))
model.add(Conv1D(100, 6, activation='relu'))
model.add(MaxPooling1D(3))
model.add(Conv1D(160, 6, activation='relu'))
model.add(Conv1D(160, 6, activation='relu'))
model.add(GlobalAveragePooling1D())
model.add(Dropout(0.5))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
That I train:
self.model.fit(X_train, y_train, batch_size=3, epochs=5, verbose=1)
Then I use evaluate:
self.model.evaluate(x_test, y_test, verbose=1)
And predict:
predictions = self.model.predict(data)
result = np.where(predictions[0] == np.amax(predictions[0]))
if result[0][0] == 0:
return '3'
elif result[0][0] == 1:
return '20'
else:
return '100'
For each class predicted, I confront it with the actual label, and then I calculate correct guesses / total examples, that should be equivalent to accuracy from the evaluate() function. Here's the code:
correct = 0
for profile in self.profile_file: #profile_file is an opened file
ts1 = self.ts1_file.readline()
ts2 = self.ts2_file.readline()
ts3 = self.ts3_file.readline()
ts4 = self.ts4_file.readline()
data = ts1, ts2, ts3, ts4
test_data = self.dl.transform(data) # see the last block of code I posted
prediction = self.model.predict(test_data)
if prediction == label:
correct += 1
acc = correct / 100 # 100 is the number of total examples
Data feeded to evaluate() is taken from this function:
label = pd.read_csv(os.path.join(self.testDir, 'profile.txt'), sep='\t', header=None)
label = np_utils.to_categorical(label[0].factorize()[0])
data = [os.path.join(self.testDir,'TS2.txt'),os.path.join(self.testDir, 'TS1.txt'),os.path.join(self.testDir,'TS3.txt'),os.path.join(self.testDir, 'TS4.txt')]
df = pd.DataFrame()
for txt in data:
read_df = pd.read_csv(txt, sep='\t', header=None)
df = df.append(read_df)
df = df.apply(self.__predict_scale)
df = df.sort_index().values.reshape(-1,4,60).transpose(0,2,1)
return df, label
While data feeded to predict() is taken from this other one:
df = pd.DataFrame()
for txt in data: # data
read_df = pd.read_csv(StringIO(txt), sep='\t', header=None)
df = df.append(read_df)
df = df.apply(self.__predict_scale)
df = df.sort_index().values.reshape(-1,4,60).transpose(0,2,1)
return df
Accuracies yielded by evaluate() and predict() are always different: in particular, the maximum difference I noted was when evaluate() resulted in a 78% accuracy while predict() in a 95% accuracy. The only difference between the two functions is that I make predict() work on an example at a time, while evaluate() takes the entire dataset all at once, but it should result in no difference. How can it be?
UPDATE 1: It seems that the problem is in how I prepare my data. In the case of predict(), I transform only one line at a time from each file using the last block of code I posted, while in feeding evaluate(), I transform the entire files using the other function reported. Why should it be different? It seems to me that I'm applying the exact same transformation, the only difference is in the number of rows transformed.