2

I have a keras RNN model that like this one using pre-trained Word2Vec weights

model = Sequential()
model.add(L.Embedding(input_dim=vocab_size, output_dim=embedding_size,
                      input_length=max_phrase_length,
                      weights=[pretrained_weights],trainable=False))
model.add((L.LSTM(units=rnn_units)))
model.add((L.Dense(vocab_size,activation='sigmoid')))
adam=Adam(lr)
model.compile(optimizer=adam, loss='cosine_proximity',
             metrics=['cosine_proximity'])

During training I want to create a custom loss function to compare the predicted and true word vectors associated with the predicted and true integer indices.

def custom_loss(y_true,y_pred):
    A=extract_the_word_vectors_for_the_indices(y_true)
    B=extract_the_word_vectors_for_the_indices(y_pred)
    return some keras backend function of A and B

For example, suppose my batch size is 4. Then from model.fit, I can pass y_pred through an argmax such that K.argmax(y_pred)=[i1,i2,i3,4], integers corresponding to the word vectors vectors[i1], vectors[i2], vectors[i3], vectors[i4]. I want to do some maths with the predicted vectors and compare them to the ground truth vectors, as a way to monitor progress (not as a loss function). So I need a "Keras-full" way to do this.

If y_true were a numpy array of indices and word_model is my word2vec model, then I could get an array of the vectors by just doing word_model.wv.vectors[y_true]. However it seems very wasteful to convert y_true from tensor to numpy, then back to tensor later. So I can't seem to get anything to work in native keras, and when I try to extract the tensors to numpy arrays and work with those, I get errors as well. Grrrr...

I imagine there has to be a way to extract the word vectors from the embedding layer for y_pred and y_true, but I have no idea how. Anyone?

AstroBen
  • 813
  • 2
  • 9
  • 20
  • Will this approach work for you? https://stackoverflow.com/questions/46464549/keras-custom-loss-function-accessing-current-input-pattern – Manoj Mohan Mar 11 '19 at 19:32
  • Afraid not. I need to specifically extract the word vectors corresponding to each integer in the prediction and truth tensors. The word vectors are either in the embedding layer or the word2vec model. – AstroBen Mar 11 '19 at 20:23
  • What does B=extract the word vectors for the indices in y_pred exactly means? Your network output is dense with the vocab size with a sigmoid as activation, and will obviously won't be a 1-hot. If I understand correctly, you wish to take the index of the most probable word and then use the matching word embedding for some computation. This requires tf.argmax / K.argmax, which is not differentiable. Do correct me if I'm wrong. – ian Mar 11 '19 at 22:57
  • ian, I'm not using this for a loss function, just a metric that I can watch as the fit progresses. As for the first question, I edited the question a bit to answer that. I want to be able to do some maths between the predicted and ground-truth word vectors, in addition to just looking at cross entropy or accuracy. – AstroBen Mar 12 '19 at 00:05

1 Answers1

2

An easy solution is to use the functional api, and any time you want you can call your custom loss function.

from keras.models import Model
from keras.layers import Input, Embedding, LSTM, Dense
from keras.optimizers import Adam

model_input = Input((max_phrase_length, vocab_size))
embedding_layer = Embedding(input_dim=vocab_size, output_dim=embedding_size,
                      input_length=max_phrase_length,
                      weights=[pretrained_weights],trainable=False)

x = embedding_layer(model_input)
x = LSTM(units=rnn_units)(x)
x = Dense(units=vocab_size, activation='sigmoid')(x)

orignal_model = Model(inputs=model_input, outputs=x)
orignal_model.compile(optimizer=Adam(lr),
                      loss='cosine_proximity',
                      metrics=['cosine_proximity'])

embedding_model = Model(inputs=model_input, outputs=embedding_layer(model_input))

Now, you can use the embedding_model to do what you need:

def custom_loss(y_true,y_pred, embedding_model):
    A = embedding_model.predict(np.argmax(y_true))
    B = embedding_model.predict(np.argmax(y_pred))
    return some keras backend function of A and B

I haven't checked the code, so it might need a little tweaking.

ian
  • 399
  • 2
  • 15