3

I want to experiment with different embeddings such Word2Vec, ELMo, and BERT but I'm a little confused about whether to use the word embeddings or sentence embeddings, and why. I'm using the embeddings as features input to SVM classifier.

Thank you.

NST
  • 115
  • 10

1 Answers1

2

Though both approaches can prove efficient for different datasets, as a rule of thumb I would advice you to use word embeddings when your input is of a few words, and sentence embeddings when your input in longer (e.g. large paragraphs).

Alex Metsai
  • 1,837
  • 5
  • 12
  • 24
  • Thank you for your answer. If I used word-embeddings, each input will be of a different length, should I pad them with zeros? – NST Jul 02 '21 at 12:29
  • 1
    It doesn't exactly work this way, with most word embedding systems that I know, you define the TOTAL number of possible words as the input dimension. See this for example. `https://github.com/Eligijus112/word-embedding-creation/blob/master/master.py#L51` – Alex Metsai Jul 02 '21 at 12:34