-1

I'm building a text summarizer using tensorflow and transformer architecture. for learning purposes and I have the following Parameters

encoder vocab size : 100000
decoder vocab size  : 10000
encoder maxlen : 1000
decoder maxlen : 80
nun layers : 4
d model :128
dff : 512
num heads : 4
batch size :32

It's working fine and I'm kind of happy with the initial results as its my first model.

but my question is: My VRAM used is always 6.7gig out of 8 gig ALWAYS , no matter how small I tune the hyper Parameters, and when I try to to make the Parameters larger it goes to 6.7 then throw the will known error "OOM" given the fact that the 1.3 are completely free..... any thoughts ?

all I want is to get the full utilization of my GPU not only almost 80%

desertnaut
  • 57,590
  • 26
  • 140
  • 166

0 Answers0