tfidf weighted average of word embeddings with Keras

Question

I don't know how this is possible, but I want to calculated some weighted average of word embeddings in a sentence like with tfidf scores.

Is it exactly this, but with just weights:

averaging a sentence’s word vectors in Keras- Pre-trained Word Embedding

import keras
from keras.layers import Embedding
from keras.models import Sequential
import numpy as np
# Set parameters
vocab_size=1000
max_length=10
# Generate random embedding matrix for sake of illustration
embedding_matrix = np.random.rand(vocab_size,300)

model = Sequential()
model.add(Embedding(vocab_size, 300, weights=[embedding_matrix], 
input_length=max_length, trainable=False))
# Average the output of the Embedding layer over the word dimension
model.add(keras.layers.Lambda(lambda x: keras.backend.mean(x, axis=1)))

model.summary()

How could you get with a custom layer or lambda layer the proper weights belonging to a specific word? You would need access somehow the embedding layer to get the index and then look up the proper weight.

Or is there a simple way I don't see?

score 0 · Answer 1 · answered Sep 21 '20 at 12:08

embeddings = model.layers[0].get_weights()[0] # get embedding layer, shape (vocab, embedding_dim)

Alternatively, if you define the layer object:

embedding_layer = Embedding(vocab_size, 300, weights=[embedding_matrix], input_length=max_length, trainable=False)
embeddings = emebdding_layer.get_weights()[0]

From here, you can probably directly address the individual weights by just querying their positions using your unprocessed bag of words or integer inputs.

If you want to, you can additionally go through the actual word vectors by the string words, though that shouldn't be necessary for simply accumulating all word vectors of each sentence:

# `word_to_index` is a mapping (i.e. dict) from words to their index that you need to provide (from your original input data which should be ints)
word_embeddings = {w:embeddings[idx] for w, idx in word_to_index.items()}

print(word_embeddings['chair'])  # gives you the word vector

ok, but how does this answers my question? I want to apply external weights (and not just get the weights) inside a possible layer to get a weighted average. — ctiid, Sep 23 '20 at 08:28
"apply external weights inside a layer" - what does this mean? You want the trained weight vectors from each word and then average them for each sentence, right? — runDOSrun, Sep 23 '20 at 08:34

tfidf weighted average of word embeddings with Keras

1 Answers1