while transfer learning / fine-tuning recent language models, such as BERT and XLNET, is by far a very common practice, how is this for GloVe?
Basically, I see two options when using GloVe to get dense vector representations that can be used by downstream NNs.
1) Fine-tune GloVe embeddings (in pytorch terms, gradient enabled)
2) Just use the embeddings without gradient.
For instance, given GloVe's embeddings matrix, I do
embed = nn.Embedding.from_pretrained(torch.tensor(embedding_matrix, dtype=torch.float))
...
dense = nn.Linear(...)
Is it best practice to solely use GloVe to get vector representation (and only train the dense layer and potentially other layers) or would one fine-tune the embeddings matrix, too?