StableLM answers too slow on GCP VM with GPU

Question

I installed StableLM on a GCP VM with these specs:

1 x NVIDIA Tesla P4, 8 vCPU - 30 GB memory.

And I set the model params llm_int8_enable_fp32_cpu_offload=True. But it takes too long to answer questions, ~8 minutes. It was faster even when using CPU,~2 mins. I downloaded the repository from the official Github link directly and I'm running the notebook there. Where am I doing wrong? (I installed nvidia and cuda and the code finding nvidia-smi)

Also when I remove llm_int8_enable_fp32_cpu_offload=True param the code not even working. It throws this error: (I upgraded memory to 16 vCPU, 104GB memory but it still shows this error)

You would most probably need at least `g2-standard-4` or `a2-highgpu-1g` to get the model working properly. — alvas, Aug 22 '23 at 13:32
can the reason for this slowness be eliminated by setting the correct model parameters? — srls01, Aug 28 '23 at 06:10

score 0 · Answer 1 · answered Aug 28 '23 at 15:37

Seems like the resources used are all good, I recommend looking at the CPU type just as mentioned by @alvas.

Here is a link for reference a discussion of the StableLM system specs and some recommendations for optimal performance. [1]

[1] https://github.com/Stability-AI/StableLM/issues/17

StableLM answers too slow on GCP VM with GPU

1 Answers1