0

I installed StableLM on a GCP VM with these specs:

1 x NVIDIA Tesla P4, 8 vCPU - 30 GB memory.

And I set the model params llm_int8_enable_fp32_cpu_offload=True. But it takes too long to answer questions, ~8 minutes. It was faster even when using CPU,~2 mins. I downloaded the repository from the official Github link directly and I'm running the notebook there. Where am I doing wrong? (I installed nvidia and cuda and the code finding nvidia-smi)

Also when I remove llm_int8_enable_fp32_cpu_offload=True param the code not even working. It throws this error: (I upgraded memory to 16 vCPU, 104GB memory but it still shows this error) enter image description here

srls01
  • 425
  • 2
  • 4
  • 12
  • You would most probably need at least `g2-standard-4` or `a2-highgpu-1g` to get the model working properly. – alvas Aug 22 '23 at 13:32
  • can the reason for this slowness be eliminated by setting the correct model parameters? – srls01 Aug 28 '23 at 06:10

1 Answers1

0

Seems like the resources used are all good, I recommend looking at the CPU type just as mentioned by @alvas.

Here is a link for reference a discussion of the StableLM system specs and some recommendations for optimal performance. [1]

[1] https://github.com/Stability-AI/StableLM/issues/17