0

Hi im just getting started with undertsanding transformer based models and I am not able to find how the token embeddings are arrived at?. there are multiple tokenization approaches and multiple vocabularies/documents llms are trained on. so my question is

  1. whether each llm also trains its own token embeddings?
  2. how do those pre trained embeddings work for transfer learning or finetuning, on custom data sets where some OOV words may be present or we have some special unique tokens we want to keep?
dasman
  • 237
  • 1
  • 2
  • 10

0 Answers0