1

I have been trying to tune a neural net for some time now but unfortunately, I cannot get a good performance out of it. I have a time-series dataset and I am using RandomizedSearchCV for binary classification. My code is below. Any suggestions or help will be appreciated. One thing is that I am still trying to figure out how to incorporate is early stopping.

EDIT: Forgot to add that I am measuring the performance based on F1-macro metric and I cannot get a scoring higher that 0.68. Another thing that I noticed is that the more parameters I try to estimate at once (increase my grid), the worse my scoring is.

train_size = int(0.70*X.shape[0])
X_train, X_test, y_train, y_test = X[0:train_size], X[train_size:],y[0:train_size], y[train_size:]


from numpy.random import seed
seed(3)
from tensorflow import set_random_seed
set_random_seed(4)

from imblearn.pipeline import Pipeline

def create_model(activation_1='relu', activation_2='relu', 
                 neurons_input = 1, neurons_hidden_1=1,
                 optimizer='adam',
                 input_shape=(X_train.shape[1],)):

  model = Sequential()
  model.add(Dense(neurons_input, activation=activation_1, input_shape=input_shape, kernel_initializer='random_uniform'))

  model.add(Dense(neurons_hidden_1, activation=activation_2, kernel_initializer='random_uniform'))


  model.add(Dense(2, activation='sigmoid'))

  model.compile (loss = 'sparse_categorical_crossentropy', optimizer=optimizer)
  return model


clf=KerasClassifier(build_fn=create_model, verbose=0)

param_grid = {
    'clf__neurons_input':[5, 10, 15, 20, 25, 30, 35],
    'clf__neurons_hidden_1':[5, 10, 15, 20, 25, 30, 35],
    'clf__optimizer': ['Adam', 'Adamax','Adadelta'],
    'clf__activation_1': ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear'],
    'clf__activation_2': ['softmax', 'softplus', 'softsign', 'relu', 'tanh', 'sigmoid', 'hard_sigmoid', 'linear'],
    'clf__batch_size': [40,60,80,100]}


pipe = Pipeline([
    ('oversample', SMOTE(random_state=12)),
    ('clf', clf)
    ])

my_cv = TimeSeriesSplit(n_splits=5).split(X_train)

rs_keras = RandomizedSearchCV(pipe, param_grid, cv=my_cv, scoring='f1_macro', refit='f1_macro', verbose=3, n_jobs=1,random_state=42)
rs_keras.fit(X_train, y_train)

print("Best: %f using %s" % (rs_keras.best_score_, rs_keras.best_params_))

from sklearn.metrics import classification_report, confusion_matrix
y_pred=rs_keras.predict(X_test)
clfreport = classification_report(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
print (clfreport)
print (cm)
scores_test = rs_keras.score(X_test,y_test)
print ("Testing:", scores_test)

My scoresenter image description here

eemamedo
  • 325
  • 1
  • 6
  • 14
  • 2
    It seems like you have very imbalanced data. What do you do about it? Maybe the problem is that all your models fail to find a good solution due to the imbalance and not due to the hyperparameter optimisation? – Mischa Lisovyi Apr 16 '19 at 05:27
  • 1
    Using SMOTE via the pipeline. – eemamedo Apr 16 '19 at 19:46
  • 1
    OK, makes sense, i can find it in the code now. The next question that comes to my mind with no access to the data: why do you think that F1 of 0.68 is not great already? Maybe your network architecture does not fit to the problem? (time series often is tackled with RNN instead of fully-connected networks) Or maybe the stats of the `class 1` fluctuate a lot (since it's fraction is small) and your test sample is very different from training? (what's performance on the training set and did you do EDA to compare distributions between train and test sets?) – Mischa Lisovyi Apr 17 '19 at 12:22
  • 1
    I will take a look at the test set, once again. I wonder if SMOTE really messes up the underlying distribution so much, that test set is no longer a representation of the balanced training set. – eemamedo Apr 18 '19 at 13:48
  • 1
    In order to eliminate `SMOTE` artifacts, you can also try to use weights in training instead of resampling. There was somewhat related question [here](https://stackoverflow.com/questions/54143543/grid-search-and-kerasclassifier-using-class-weights) – Mischa Lisovyi Apr 18 '19 at 16:37
  • I tried with class_weights as well but it seems not to work properly; I keep having a model displaying a behavior of unbalanced dataset (final metric is strongly affected by dominant class). – eemamedo Apr 18 '19 at 16:44

1 Answers1

1

About EarlyStopping,

clf=KerasClassifier(build_fn=create_model, verbose=0)

stop = EarlyStopping(monitor='your_metric', min_delta=0, 
                         patience=5, verbose=1, mode='auto',
                         baseline=None, restore_best_weights=True)
.
.
.
grid.fit(x_train_sc, y_train_sc, callbacks = [stop])

It should work. (I tested it without pipeline structure.)

By the way, when I was trying my dataset with pipeline structure, it did not act as I thought. In my case, I was trying to StandardScale the data but gridsearch did not scale the data first, so It went into the classifier without scaling. That was an issue for me.

I suggest you the transform data before gridsearch and try without pipeline. I know about the data leakage problems but I couldn't find any other way.

Masmm
  • 27
  • 8
  • I tried early stopping but it seems to take into consideration only the training loss. For val_loss it throws an error that says something like "this is not an acceptable metric". I am inclined to move towards l1_l2 regularizers or add dropouts. – eemamedo Apr 18 '19 at 13:45
  • And regarding a pipeline; I will take a look at it. It should not make a huge difference but I will try it regardless. Thank you. – eemamedo Apr 18 '19 at 13:46
  • val_loss is the default monitoring metric, it should have accepted it. Probably, early stopping func only cares about KerasClassifier. ( ie. Validation is conducted by gridsearchCV). You can try to pass "validation_split=x.x" with fit function using **fit_params but I doubt this is the optimal way. If you have time, you can use searching functions as a general case. I mean, you can take the best_params_ , use them to train network and tune them. (Increasing batch size could speed up the process...). – Masmm Apr 18 '19 at 15:52
  • @eemamedo For regularization : [Link](https://medium.com/machine-learning-bites/deeplearning-series-deep-neural-networks-tuning-and-optimization-39250ff7786d) – Masmm Apr 18 '19 at 15:56