How do you calculate the amount of RAM needed? I'm assuming that you mean just inference, no training.
The paper "Reducing Activation Recomputation in Large Transformer Models" has good information on calculating the size of a Transformer layer.
b: batchsize
s: sequence length
l: layers
a: attention heads
h: hidden dimensions
p: bytes of precision
activations per layer = s*b*h*(34 +((5*a*s)/h))
The paper calculated this at 16bit precision.
The above is in bytes, so if we divide by 2 we can later multiply by the number of bytes of precision used later.
activations = l * (5/2)*a*b*s^2 + 17*b*h*s #divided by 2 and simplified
total = p * (params + activations)
Let's look at llama2 7b for an example:
params = 7*10^9
p = 32 #precision
b = 1 #batchsize
s = 2048 #sequence length
l = 32 #layers
a = 32 #attention heads
h = 4096 #hidden dimension
activations => 10,880,024,576
p * (activations + params) => about 66 GB
Note you can drastically reduce the memory needed by quantization.
At bit quantization you get that down to a little over 8GB.
I hope that helps and that I didn't miss anything important.