Keras embedding layers: how do they work?

Question

I am starting using Keras to build neural networks models.

I have a classification problem, where the features are discrete. To manage this case, the standard procedure consists in converting the discrete features in binary arrays, with a one-hot encoding.

However it seems that with Keras this step is not necessary, as one can simply use an Embedding layer to create a feature-vector representation of these discrete features.

How these embeddings are performed?

My understanding is that, if the discrete feature f can assume k values, then an embedding layer creates a matrix with k columns. Every time I receive a value for that feature, say i, during the training phase, only the i column of the matrix will be updated.

Is my understanding correct?

Adria Ciurana · Answer 1 · 2019-01-16T10:44:39.263

Suppose you have N objects that do not directly have a mathematical representation. For example words.

As neural networks are only able to work with tensors you should look for some way to translate those objects to tensors. The solution is in a giant matrix (embedding matrix) where it relates each index of an object with its translation to tensor.

object_index_1: vector_1
object_index_1: vector_2
...
object_index_n: vector_n

Selecting the vector of a specific object can be translated to a matrix product in the following way:

Where v is the one-hot vector that determines which word need to be translated. And M is the embedding matrix.

If we propose the usual pipeline, it would be the following:

We have a list of objects.

objects = ['cat', 'dog', 'snake', 'dog', 'mouse', 'cat', 'dog', 'snake', 'dog']

We transform these objects into indices (we calculate the unique objects).

unique = ['cat', 'dog', 'snake', 'mouse'] # list(set(objects))
objects_index = [0, 1, 2, 1, 3, 0, 1, 2, 1] #map(unique.index, objects)

We transform these indices to a one hot vector (remember that there is only one where the index is)

objects_one_hot = [[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 1, 0, 0], 
     [0, 0 , 0, 1], [1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 1, 0, 0]] # map(lambda x: [int(i==x) for i in range(len(unique))], objects_index)
#objects_one_hot is matrix is 4x9

We create or use the embedding matrix:

#M = matrix of dim x 4 (where dim is the number of dimensions you want the vectors to have). 
#In this case dim=2
M = np.array([[1, 1], [1, 2], [2, 2], [3,3]]).T # or... np.random.rand(2, 4)
#objects_vectors = M * objects_one_hot
objects_vectors = [[1, 1], [1, 2], [2, 2], [1, 2], 
    [3, 3], [1, 1], [1, 2], [2,2], [1, 2]] # M.dot(np.array(objects_one_hot).T)

Normally the embedding matrix is learned during the same model learning, to adapt the best vectors for each object. We already have the mathematical representation of the objects!

As you have seen we have used one hot and later a matrix product. What you really do is take the column of M that represents that word.

During the learning this M will be adapted to improve the representation of the object and as a consequence the loss goes down.

A very nice detailed explanation. Thanks! – London guy Apr 18 '20 at 14:51 — London guy, Apr 18 '20 at 14:51

score 8 · Accepted Answer · answered Mar 13 '17 at 13:07

As one may easily notice - multiplication of a one-hot vector with an Embedding matrix could be effectively performed in a constant time as it might be understood as a matrix slicing. And this exactly what an Embedding layer does during computations. It simply selects an appropriate index using a gather backend function. This means that your understanding of an Embedding layer is correct.

prosti · Answer 3 · 2019-03-15T08:25:23.707

The Embedding layer in Keras (also in general) is a way to create dense word encoding. You should think of it as a matrix multiply by One-hot-encoding (OHE) matrix, or simply as a linear layer over OHE matrix.

It is used always as a layer attached directly to the input.

Sparse and dense word encoding denote the encoding effectiveness.

One-hot-encoding (OHE) model is sparse word encoding model. For example if we have 1000 input activations, there will be 1000 OHE vectors for each input feature.

Let's say we know some input activations are dependent, and we have 64 latent features. We would have this embedding:

e = Embedding(1000, 64, input_length=50)

1000 tells we plan to encode 1000 words in total. 64 tells we use 64 dimensional vector space. 50 tells input documents have 50 words each.

Embedding layers will fill up randomly with non-zero values and the parameters need to be learned.

There are other parameters when creating the Embedding layer in here

What is the output from the Embedding layer?

The output of the Embedding layer is a 2D-vector with one embedding for each word in the input sequence of words (input document).

NOTE: If you wish to connect a Dense layer directly to an Embedding layer, you must first flatten the 2D output matrix to a 1D vector using the Flatten layer.

score 0 · Answer 4 · answered Apr 28 '20 at 12:43

when we are dealing with words and sentences in any area (for example NLP) we like to represent words and sentences in the form of vectors so that the machine can easily identify the word and use it for mathematical modelling. say for example we have 10 words in our vocab. we want to represent each words uniquely. the easiest way to do that will be to assign a number to each word and create a vector with 10 elements and activate only the element with that number and deactivate all other. For example say in our vocab we have dog as a word and we have assigned number 3. so the vector will be something like this

{0,0,1,0,0,0,0,0,0,0}

similarly for other words it will other elements that are activated. the above example is very simple but highly inefficient. Say we have 100000 words in vocab. To represent the 100000 words we will have 100000 [1*100000] vectors.so to do this task efficiently we can use embeddings. they represent the words in a dense(say vector with just 32 elements) form. Dog can be represented like

{0.24,0.97}

which is much more efficient and better in terms of mathematical moddelling

Keras embedding layers: how do they work?

4 Answers4