Questions tagged [deepspeed]
10 questions
2
votes
0 answers
Training time for dolly-v2-12b on a custom dataset with an A10 gpu
Hi I am trying to train the dolly-v2-12b or any of the dolly model using a custom dataset using A10 gpu. I am coding in pycharm, windows os. The task is similar to a Q&A. I am trying to use this as communication assistant that can answer the…

Sneha T S
- 21
- 6
2
votes
1 answer
How can I use decaying learning rate in DeepSpeed?
I am training dolly2.0.
When I do so, I get the following output from the terminal:
If I use DeepSpeed to perform this training, I note that the learning rate didn't improve:
Why didn't the learning rate improve?
This is the DeepSpeed config that…

AndyLinOuO
- 21
- 2
1
vote
1 answer
Does Vertex AI Training for Distributed Training Across Multi-Nodes Work With HuggingFace Trainer + Deepspeed?
I am wondering if Vertex AI Training can be used for distributed training using Huggingface Trainer and deepspeed? All I have seen are examples with the native torch distribution strategy.
It would be very helpful if someone can tell me
If…

esdy
- 11
- 2
1
vote
0 answers
Deepspeed tensor parallel gets problem in tensor alignment when using tokenizer
I tried to use deepspeed to conduct tensor parallel on starcoder as I had multiple small GPUs and each of which cannot singly hold the whole model.
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
import torch
import…

ddaa
- 49
- 2
1
vote
0 answers
how to set max gpu memory use for each device when using deepspeed for distributed training?
I am newer to deepspeed, and have some experience in deeplearning. I want to know how to set the max gpu memory to use for each device when using deepspeed?.
I have done nothong. I have no thoughts
my gpu device is about 46G, I want to run long…

hjc
- 9
- 3
1
vote
0 answers
DeepSpeed: no operator matches operands error
When I try to use DeepSpeed example to finetune a OPT 1.3b model on my local machine, I have an unexpected error, which related to following code snippet:
template
__global__ void moe_res_matmul(T* residual, T* coef, T* mlp_out, int…

coderLMN
- 3,076
- 1
- 21
- 26
0
votes
0 answers
You are using ZeRO-Offload with a client provided optimizer () which in most cases will yield poor performance
I am using Pytorch Lightning's Deepspeed strategy and I am trying to train a model and I receive this error. What are different ways of fixing this with pros/cons?
deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a…

Zachary Nagler
- 751
- 1
- 8
- 16
0
votes
0 answers
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs (Not Training or Finetuning)
Is there any way to load a Hugging Face model in multi GPUs and use those GPUs for inferences as well?
Like, there is this model which can be loaded on a single GPU (default cuda:0) and run for inference as below:
from transformers import…

NeuralAI
- 43
- 2
- 10
0
votes
0 answers
Why does the DeepSpeed `estimate_zero2_model_states_mem_needs_…` API report the same memory per CPU with different `offload_optimizer` option values?
The example provided in Memory Requirements - DeepSpeed 0.10.1 documentation is as follows:
python -c 'from deepspeed.runtime.zero.stage_1_and_2 import estimate_zero2_model_states_mem_needs_all_cold;…
0
votes
0 answers
How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning?
I'm trying to enable activation checkpointing for a T5-3b model to significantly free up GPU memory. However, it's not quite clear how to do the implementation for an LLM. Based on the PTL docs, it's something like this:
from lightning.pytorch…

Riley Hun
- 2,541
- 5
- 31
- 77