This question is rather abstract and not necessarily tied to tensorflow or keras. Say that you want to train a language model, and you want to use inputs of different sizes for your LSTMs. Particularly, I'm following this paper: https://www.researchgate.net/publication/317379370_A_Neural_Language_Model_for_Query_Auto-Completion.
The authors use, among other things, word embeddings and one-hot encoding of characters. Most likely, the dimensions of each of these inputs are different. Now, to feed that into a network, I see a few alternatives but I'm sure I'm missing something and I would like to know how it should be done.
- Create a 3D tensor of shape (instances, 2, max(embeddings,characters)). That is, padding the smaller input with 0s.
- Create a 3D tensor of shape (instances, embeddings+characters, 1)). That is, concatenating inputs.
It looks to me that both alternatives are bad for efficiently training the model. So, what's the best way to approach this? I see the authors use an embedding layer for this purpose, but technically, what does that mean?
EDIT
Here are more details. Let's call these inputs X (character-level input) and E (word-level input). On each character of a sequence (a text), I compute x, e and y, the label.
x
: character one-hot encoding. My character index is of size 38, so this is a vector filled with 37 zeros and one 1.e
: precomputed word embedding of dimension 200. If the character is a space, I fetch the word embedding of the previous word in the sequence, Otherwise, I assign the vector for incomplete word (INC
, also of size 200). Real example with the sequence "red car":r>INC, e>INC, d>INC, _>embeddings["red"], c>INC, a>INC, r>INC
.y
: the label to be predicted, which is the next character, one-hot encoded. This output is of the same dimension asx
because it uses the same character index. In the example above, for "r",y
is the one-hot encoding of "e".