Results of loaded keras model are different

Question

I am trying to build a LSTM model in order to detect sentiment of texts. (0 -> normal, 1 -> hateful)After I trained my model, I send some texts to my model for prediction. The predicted results are as I expected. However, after I load my model as "h5" file, I cannot get same accuracies even if I send same texts. Here is my training codes:


    texts = tweets['text']
    labels = tweets['label']

    labels = LabelEncoder().fit_transform(labels)
    labels = labels.reshape(-1, 1)

   
    X_train, X_test, Y_train, Y_test = train_test_split(texts, labels, test_size=0.20)

    tokenizer.fit_on_texts(X_train)
    sequences = tokenizer.texts_to_sequences(X_train)
    sequences_matrix = sequence.pad_sequences(sequences, maxlen=max_len)

    inputs = Input(name='inputs', shape=[max_len])
    layer = Embedding(max_words, 50, input_length=max_len)(inputs)
    layer = LSTM(64)(layer)
    layer = Dense(256, name='FC1')(layer)
    layer = Activation('relu')(layer)
    layer = Dropout(0.5)(layer)
    layer = Dense(1, name='out_layer')(layer)
    layer = Activation('sigmoid')(layer)
    model = Model(inputs=inputs, outputs=layer)

    earlyStopping = EarlyStopping(monitor='val_loss', min_delta=0.0001, 
                                  restore_best_weights=False)

    model.summary()
    model.compile(loss='binary_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])
    model.fit(sequences_matrix, Y_train, batch_size=128, shuffle=True, epochs=10,
              validation_split=0.2, callbacks=[earlyStopping])

 
    model.save("ModelsDL/LSTM.h5")


    test_sequences = tokenizer.texts_to_sequences(X_test)
    test_sequences_matrix = sequence.pad_sequences(test_sequences, maxlen=max_len)

    accr = model.evaluate(test_sequences_matrix, Y_test)

    print('Test set\n  Loss: {:0.3f}\n  Accuracy: {:0.3f}'.format(accr[0], accr[1]))


    texts = ["hope", "feel relax", "feel energy", "peaceful day"]

    tokenizer.fit_on_texts(texts)
    test_samples_token = tokenizer.texts_to_sequences(texts)
    test_samples_tokens_pad = pad_sequences(test_samples_token, maxlen=max_len)

    print(model.predict(x=test_samples_tokens_pad))

    del model

The output of print(model.predict(x=test_samples_tokens_pad)) is:

 [[0.0387207 ]
 [0.02622151]
 [0.3856796 ]
 [0.03749594]]

Text with "normal" sentiment results closer to 0.Also text with "hateful" sentiment results closer to 1.

As you see in the output, my results are consistent because they have "normal" sentiment.

However, after I load my model, I always encounter different results. Here is my codes:

texts = ["hope", "feel relax", "feel energy", "peaceful day"] # same texts

model = load_model("ModelsDL/LSTM.h5")
tokenizer.fit_on_texts(texts)
test_samples_token = tokenizer.texts_to_sequences(texts)
test_samples_tokens_pad = pad_sequences(test_samples_token, maxlen=max_len)

print(model.predict(x=test_samples_tokens_pad))

Output of print(model.predict(x=test_samples_tokens_pad)):

[[0.9838583 ]
 [0.99957573]
 [0.9999665 ]
 [0.9877912 ]]

As you notice, The same LSTM model treated the texts as if they had a hateful context.

What should I do for this problem ?

EDIT: I solved the problem. I saved the tokenizer which is used while model training. Then, I loaded that tokenizer before tokenizer.fit_on_texts(texts) for predicted texts.

The next step would be comparing the weights of the model vs the loaded model. [get weights](https://stackoverflow.com/questions/43715047/how-do-i-get-the-weights-of-a-layer-in-keras). pls let us know if this does not help on root cause. thanks — simpleApp, May 09 '21 at 14:29

score 0 · Answer 1 · answered May 09 '21 at 14:32

On your test train split code you need to give a random state to get similar results.For example; X_train, X_test, Y_train, Y_test = train_test_split(texts, labels, test_size=0.20,random_state=15). Try every state like 1,2,3,4....Once you get the result you like then you can save it and use after with same random state.Hope it would solve your problem.

Results of loaded keras model are different

1 Answers1