4

I'm practicing image captioning and have some problems with different dimensions of tensors. So I have image embedding aka size [1, 512], but GPT2, which I use for caption generation, needs size [n, 768], where n is number of tokens of the caption's beginning. I don't know how I should change the dimension of my image embedding to pass it through GPT2. I thought it would be a good idea to fill image embedding with zeros so in will be size [1, 768] but I think it will negatively affect on the result caption. Thank you for your help!

I've tried to fill image embeddings with zeros to be size [1, 768] but I think it won't help a lot

Christoph Rackwitz
  • 11,317
  • 4
  • 27
  • 36
kat0ewww
  • 41
  • 1

0 Answers0