1

In my work, I used my own corpus to train a Word2Vec model using gensim. Then I used several small corpus to "update" that model (producing different sets of vectors). This process well documented in gensim.

I am trying to replicate a similar process with GloVe model. I could find the code to train my own GloVe model here. However, I am not sure how to go about updating this model with different new corpus. Does it even make sense to "update" a GloVe model?

This answer says no. But strong confirmation will help.

Imrul Huda
  • 181
  • 1
  • 6
  • 2
    While the gensim process can inherently be continued with incremental training, that doesn't mean it's well-founded. Such updates that only touch some subset of words, with arbitrary choices of relative original-dataset influence/learning-rate vs newer-subset influence/learning-rate, may work up to a point, with careful choices/evaluation, but also risk degrading the internal-comparability of words. (So "well documented" doesn't necessarily mean "good idea".) – gojomo Sep 22 '20 at 21:33
  • 2
    I'm less familiar with GLoVe, so can't speak authoritatively, but my understanding of its optimization method is that in its original definition it requires all examples present. And, even if a partial-update/further-tuning could be grafted-on, it'd face the same risks: unless you're really including all examples in a training session, the balance between newer-influences & original-data risks creating incomparable drift between subsets of the model's words. – gojomo Sep 22 '20 at 21:36
  • Can you explain the problem of updating a little more to a novice? maybe use one or two simple sentences as an example. – Imrul Huda Sep 23 '20 at 18:26
  • 1
    Hard to do in this format, but a quick try, very broadly, and vaguely: imagine training a Word2Vec model on a (still toy-sized) corpus with 1000 unique words & 100,000 training-words. Training the model to 'convergence' means your vectors/model are, within the constraints you've set, as good as possible at predicting the training data. Great! But now there's 10 new words you want to teach it, & maybe you're lucky enough to have 1000 training words including many examples of these new words, mixed with a bunch of already-known words. … – gojomo Sep 23 '20 at 20:55
  • 1
    The most grounded approach would be to start from scratch: now you have 1010 word vocabulary, and 101,000 training words, & you train enough to make the model as good as it can be, & all words get similar treatment, in an interleaved tug-of-war, from same starting point. But, your model may not be comparable in any interesting way with the earlier one. (The algorithm intentionally uses some randomization, & is subject to jitter from typical multithreaded implementations, so the word 'apple' might be in very-different places each run, even though relative-to-others each is just as good.) … – gojomo Sep 23 '20 at 20:58
  • So what if instead you just tried training the new-1000-words into the old model? Well, then you have tricky choices with no clear right answers about what learning-rate to use, how much training to do, & so forth. Many of your original 1000 uniue words may not appear at all in the 1000 new training words. Those vectors won't change at all with the new training word. But all words in the new training data will, as part of the new, constrained tug-of-war optimization. … – gojomo Sep 23 '20 at 21:01
  • Each step they get slightly better for the new data may (& to some extent certainly does) makes them slightly *worse* with respect to all the other non-repeated contexts (which are still likely important for the overall model). The simple shortcut - training on just the new texts – might work ok, if done a little, with some lucky parameter choices – but really *isn't* reproducing the all-against-all optimization that ensures all word-vectors relate against all others. – gojomo Sep 23 '20 at 21:04
  • 2
    @gojomo I think this deserves to be put in an answer – Sergey Bushmanov Sep 23 '20 at 21:09

0 Answers0