I am trying my hand at ELMo by simply using it as part of a larger PyTorch model. A basic example is given here.
This is a torch.nn.Module subclass that computes any number of ELMo representations and introduces trainable scalar weights for each. For example, this code snippet computes two layers of representations (as in the SNLI and SQuAD models from our paper):
from allennlp.modules.elmo import Elmo, batch_to_ids
options_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json"
weight_file = "https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"
# Compute two different representation for each token.
# Each representation is a linear weighted combination for the
# 3 layers in ELMo (i.e., charcnn, the outputs of the two BiLSTM))
elmo = Elmo(options_file, weight_file, 2, dropout=0)
# use batch_to_ids to convert sentences to character ids
sentences = [['First', 'sentence', '.'], ['Another', '.']]
character_ids = batch_to_ids(sentences)
embeddings = elmo(character_ids)
# embeddings['elmo_representations'] is length two list of tensors.
# Each element contains one layer of ELMo representations with shape
# (2, 3, 1024).
# 2 - the batch size
# 3 - the sequence length of the batch
# 1024 - the length of each ELMo vector
My question concerns the 'representations'. Can you compare them to normal word2vec output layers? You can choose how many ELMo will give back (increasing an n-th dimension), but what is the difference between these generated representations and what is their typical use?
To give you an idea, for the above code, embeddings['elmo_representations']
returns a list of two items (the two representation layers) but they are identical.
In short, how can one define the 'representations' in ELMo?