1

When I read the paper "Convolutional Neural Networks for Sentence Classification"-Yoon Kim-New York University, I noticed that the paper implemented the "CNN-non-static" model--A model with pre-trained vectors from word2vec,and all words— including the unknown ones that are randomly initialized, and the pre-trained vectors are fine-tuned for each task. So I just do not understand how the pre-trained vectors are fine-tuned for each task. Cause as far as I know, the input vectors, which are converted from strings by word2vec.bin(pre-trained), just like image matrix, which can not change during training CNN. So, if they can, HOW? Please help me out, Thanks a lot in advance!

ad absurdum
  • 19,498
  • 5
  • 37
  • 60

1 Answers1

0

The word embeddings are weights of the neural network, and can therefore be updated during backpropagation.

E.g. http://sebastianruder.com/word-embeddings-1/ :

Naturally, every feed-forward neural network that takes words from a vocabulary as input and embeds them as vectors into a lower dimensional space, which it then fine-tunes through back-propagation, necessarily yields word embeddings as the weights of the first layer, which is usually referred to as Embedding Layer.

Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
  • Thanks for your reply, now I get it, but I still get questions, hope you could help me out, thanks a lot!here are my questions: – Prince of Persia Oct 19 '16 at 14:44
  • 1.when I train my own CNN for text classification, I use word2vec to initialise the words, then I just employ these pre-trained vectors as my input features to train CNN, so if I never had a embedding layer, it surely can not do any fine-tunes through back-propagation. my question is if I want to do fine-tues, does it means to create a Embedding layer?and how to create it? – Prince of Persia Oct 19 '16 at 14:51
  • 2.When we train word2vec, we use unsupervised training right? as in my case, I use the skip-gram model to get my pre-trained word2vec; But when I had the vec.bin and use it in the text classification model (CNN) as my words initialiser, if I could fine-tunes the word-to-vector map in vec.bin, does it means that I have to have a CNN net structure exactly same as the one when training my word2vec? – Prince of Persia Oct 19 '16 at 14:57
  • 3. Are the skip-gram model and CBOW model are only used for unsupervised word2vec training? Or they could also apply for other general text classification tasks? and what's the different of the network between word2vec unsupervised training supervised fine-tuning? – Prince of Persia Oct 19 '16 at 15:09
  • Hope you are not losing your patience! Really appreciate if you could help me out! Thank you again! – Prince of Persia Oct 19 '16 at 15:10
  • @PrinceofPersia Comments unfortunately often get deleted on Stack Exchange. Could you please open a new thread for each of these three questions? Also this way other people can answer, as I may be too busy to look at it in the near future.Thanks! – Franck Dernoncourt Oct 19 '16 at 15:55
  • @ Franck Dernoncourt thank your for telling me about that , I start a new post, please check it out when you get some time, I really need to get some precious first-hand knowledge from the expert like you!(Cause I am an undergraduate, and recently I'm applying for graduate schools, and it's a big time for me to have those knowledge.) really really appreciate! here's my new post link: [link](http://stackoverflow.com/questions/40143405/how-to-fine-tune-word2vec-when-training-our-cnn-for-text-classification) – Prince of Persia Oct 20 '16 at 00:09