Questions tagged [deepspeed]

10 questions
2
votes
0 answers

Training time for dolly-v2-12b on a custom dataset with an A10 gpu

Hi I am trying to train the dolly-v2-12b or any of the dolly model using a custom dataset using A10 gpu. I am coding in pycharm, windows os. The task is similar to a Q&A. I am trying to use this as communication assistant that can answer the…
2
votes
1 answer

How can I use decaying learning rate in DeepSpeed?

I am training dolly2.0. When I do so, I get the following output from the terminal: If I use DeepSpeed to perform this training, I note that the learning rate didn't improve: Why didn't the learning rate improve? This is the DeepSpeed config that…
AndyLinOuO
  • 21
  • 2
1
vote
1 answer

Does Vertex AI Training for Distributed Training Across Multi-Nodes Work With HuggingFace Trainer + Deepspeed?

I am wondering if Vertex AI Training can be used for distributed training using Huggingface Trainer and deepspeed? All I have seen are examples with the native torch distribution strategy. It would be very helpful if someone can tell me If…
1
vote
0 answers

Deepspeed tensor parallel gets problem in tensor alignment when using tokenizer

I tried to use deepspeed to conduct tensor parallel on starcoder as I had multiple small GPUs and each of which cannot singly hold the whole model. from transformers import AutoModelForCausalLM, AutoTokenizer import os import torch import…
ddaa
  • 49
  • 2
1
vote
0 answers

how to set max gpu memory use for each device when using deepspeed for distributed training?

I am newer to deepspeed, and have some experience in deeplearning. I want to know how to set the max gpu memory to use for each device when using deepspeed?. I have done nothong. I have no thoughts my gpu device is about 46G, I want to run long…
hjc
  • 9
  • 3
1
vote
0 answers

DeepSpeed: no operator matches operands error

When I try to use DeepSpeed example to finetune a OPT 1.3b model on my local machine, I have an unexpected error, which related to following code snippet: template __global__ void moe_res_matmul(T* residual, T* coef, T* mlp_out, int…
coderLMN
  • 3,076
  • 1
  • 21
  • 26
0
votes
0 answers

You are using ZeRO-Offload with a client provided optimizer () which in most cases will yield poor performance

I am using Pytorch Lightning's Deepspeed strategy and I am trying to train a model and I receive this error. What are different ways of fixing this with pros/cons? deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a…
Zachary Nagler
  • 751
  • 1
  • 8
  • 16
0
votes
0 answers

Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs (Not Training or Finetuning)

Is there any way to load a Hugging Face model in multi GPUs and use those GPUs for inferences as well? Like, there is this model which can be loaded on a single GPU (default cuda:0) and run for inference as below: from transformers import…
0
votes
0 answers

Why does the DeepSpeed `estimate_zero2_model_states_mem_needs_…` API report the same memory per CPU with different `offload_optimizer` option values?

The example provided in Memory Requirements - DeepSpeed 0.10.1 documentation is as follows: python -c 'from deepspeed.runtime.zero.stage_1_and_2 import estimate_zero2_model_states_mem_needs_all_cold;…
0
votes
0 answers

How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning?

I'm trying to enable activation checkpointing for a T5-3b model to significantly free up GPU memory. However, it's not quite clear how to do the implementation for an LLM. Based on the PTL docs, it's something like this: from lightning.pytorch…
Riley Hun
  • 2,541
  • 5
  • 31
  • 77