Questions tagged [glove]

GloVe is an unsupervised learning algorithm for obtaining vector representations for words (word embeddings).

GloVe is an unsupervised learning algorithm for obtaining vector representations for words (word embeddings). See https://nlp.stanford.edu/projects/glove/ for more information.

92 questions
28
votes
4 answers

How to Train GloVe algorithm on my own corpus

I tried to follow this. But some how I wasted a lot of time ending up with nothing useful. I just want to train a GloVe model on my own corpus (~900Mb corpus.txt file). I downloaded the files provided in the link above and compiled it using cygwin…
Codir
  • 311
  • 1
  • 3
  • 7
25
votes
2 answers

What's the major difference between glove and word2vec?

What is the difference between word2vec and glove? Are both the ways to train a word embedding? if yes then how can we use both?
Hrithik Puri
  • 286
  • 1
  • 3
  • 20
23
votes
2 answers

What is "unk" in the pretrained GloVe vector files (e.g. glove.6B.50d.txt)?

I found "unk" token in the glove vector file glove.6B.50d.txt downloaded from https://nlp.stanford.edu/projects/glove/. Its value is as follows: unk -0.79149 0.86617 0.11998 0.00092287 0.2776 -0.49185 0.50195 0.00060792 -0.25845 0.17865 0.2535…
Abhay Gupta
  • 786
  • 12
  • 30
16
votes
1 answer

Is it possible to freeze only certain embedding weights in the embedding layer in pytorch?

When using GloVe embedding in NLP tasks, some words from the dataset might not exist in GloVe. Therefore, we instantiate random weights for these unknown words. Would it be possible to freeze weights gotten from GloVe, and train only the newly…
rcshon
  • 907
  • 1
  • 7
  • 12
13
votes
2 answers

Converting tokens to word vectors effectively with TensorFlow Transform

I would like to use TensorFlow Transform to convert tokens to word vectors during my training, validation and inference phase. I followed this StackOverflow post and implemented the initial conversion from tokens to vectors. The conversion works as…
7
votes
1 answer

Using pretrained glove word embedding with scikit-learn

I have used keras to use pre-trained word embeddings but I am not quite sure how to do it on scikit-learn model. I need to do this in sklearn as well because I am using vecstack to ensemble both keras sequential model and sklearn model. This is what…
BlueMango
  • 463
  • 7
  • 21
7
votes
0 answers

Glove Pytorch speed up

I am trying to implement the GloVe algorithm in pytorch. This is the first time I am using pytorch and I think my implementation might not be very efficient. Apart from the obvious (vectorizing the for loop that is run every batch) is there anything…
jonasus
  • 231
  • 1
  • 15
7
votes
3 answers

Improving on the basic, existing GloVe model

I am using GloVe as part of my research. I've downloaded the models from here. I've been using GloVe for sentence classification. The sentences I'm classifying are specific to a particular domain, say some STEM subject. However, since the existing…
cs95
  • 379,657
  • 97
  • 704
  • 746
6
votes
2 answers

How to use a pre-trained embedding matrix in tensorflow 2.0 RNN as initial weights in an embedding layer?

I'd like to use a pretrained GloVe embedding as the initial weights for an embedding layer in an RNN encoder/decoder. The code is in Tensorflow 2.0. Simply adding the embedding matrix as a weights = [embedding_matrix] parameter to the…
Jake
  • 61
  • 1
  • 4
5
votes
1 answer

averaging a sentence’s word vectors in Keras- Pre-trained Word Embedding

I am new to Keras. My goal is to create a Neural Network Multi-Classification for Sentiment Analysis for tweets. I used Sequential in Keras to build my model. I want to use pre-trained word embeddings in the first layer of my model, specifically…
HelpASisterOut
  • 3,085
  • 16
  • 45
  • 89
4
votes
2 answers

Using torch.nn.Embedding for GloVe: should we fine-tune the embeddings or just use them as they are?

while transfer learning / fine-tuning recent language models, such as BERT and XLNET, is by far a very common practice, how is this for GloVe? Basically, I see two options when using GloVe to get dense vector representations that can be used by…
pedjjj
  • 958
  • 3
  • 18
  • 40
3
votes
3 answers

Why can't I download a dataset with the Gensim download API

When I do the below: >>> import gensim.downloader as api >>> model = api.load("glove-twitter-25") # load glove vectors the gensim.downloader API throws the below error: [Errno 2] No such file or…
vtim
  • 31
  • 1
  • 2
3
votes
0 answers

Glove implementation with PySpark

I'm working with word embeddings using PySpark, in particular I'm working with Word2Vec. Now, I'd like to try Glove instead of Word2Vec but, apparently, it doesn't exist an implementation of Glove with PySpark, but only with Scala language. Is…
Davide
  • 73
  • 5
3
votes
0 answers

GloVe embeddings - unknown / out-of-vocabulary token

I would like to know if there is a general (default) out-of-vocabulary (OOV) token for GloVe embeddings. In particular for the pre-trained ones from Stanford: https://nlp.stanford.edu/projects/glove/ I found this on SO: What is "unk" in…
MBT
  • 21,733
  • 19
  • 84
  • 102
2
votes
1 answer

ValueError: Input 0 of layer dense is incompatible with the layer: expected axis -1 of input shape to have value 896, received input shape [None,128]

I am using the CNN architecture (see code below) for text classification task (with 5 classes). The data I am using is reviews_Home_and_Kitchen_5.json downloaded from here I created a sentence embedding matrix for 1000 sentences taking the embedding…
1
2 3 4 5 6 7