Highest Voted 'deepspeed' Questions

2

votes

0 answers

Training time for dolly-v2-12b on a custom dataset with an A10 gpu

Hi I am trying to train the dolly-v2-12b or any of the dolly model using a custom dataset using A10 gpu. I am coding in pycharm, windows os. The task is similar to a Q&A. I am trying to use this as communication assistant that can answer the…

asked Jul 28 '23 at 04:51

Sneha T S

21
6

2

votes

1 answer

How can I use decaying learning rate in DeepSpeed?

I am training dolly2.0. When I do so, I get the following output from the terminal: If I use DeepSpeed to perform this training, I note that the learning rate didn't improve: Why didn't the learning rate improve? This is the DeepSpeed config that…

python databricks-dolly deepspeed

asked Jul 18 '23 at 09:12

AndyLinOuO

21
2

1

vote

1 answer

Does Vertex AI Training for Distributed Training Across Multi-Nodes Work With HuggingFace Trainer + Deepspeed?

I am wondering if Vertex AI Training can be used for distributed training using Huggingface Trainer and deepspeed? All I have seen are examples with the native torch distribution strategy. It would be very helpful if someone can tell me If…

huggingface-transformers google-cloud-vertex-ai deepspeed

asked Aug 02 '23 at 13:28

esdy

11
2

1

vote

0 answers

Deepspeed tensor parallel gets problem in tensor alignment when using tokenizer

I tried to use deepspeed to conduct tensor parallel on starcoder as I had multiple small GPUs and each of which cannot singly hold the whole model. from transformers import AutoModelForCausalLM, AutoTokenizer import os import torch import…

python pytorch transformer-model huggingface deepspeed

asked Jul 30 '23 at 06:32

ddaa

49
2

1

vote

0 answers

how to set max gpu memory use for each device when using deepspeed for distributed training?

I am newer to deepspeed, and have some experience in deeplearning. I want to know how to set the max gpu memory to use for each device when using deepspeed?. I have done nothong. I have no thoughts my gpu device is about 46G, I want to run long…

out-of-memory distributed-training deepspeed

asked Jul 24 '23 at 07:39

hjc

9
3

1

vote

0 answers

DeepSpeed: no operator matches operands error

When I try to use DeepSpeed example to finetune a OPT 1.3b model on my local machine, I have an unexpected error, which related to following code snippet: template __global__ void moe_res_matmul(T* residual, T* coef, T* mlp_out, int…

deepspeed opt 1.3b

asked Jun 15 '23 at 06:46

coderLMN

3,076
1
21
26

0

votes

0 answers

You are using ZeRO-Offload with a client provided optimizer () which in most cases will yield poor performance

I am using Pytorch Lightning's Deepspeed strategy and I am trying to train a model and I receive this error. What are different ways of fixing this with pros/cons? deepspeed.runtime.zero.utils.ZeRORuntimeException: You are using ZeRO-Offload with a…

pytorch pytorch-lightning deepspeed

asked Aug 15 '23 at 12:59

Zachary Nagler

751
1
8
16

0

votes

0 answers

Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs (Not Training or Finetuning)

Is there any way to load a Hugging Face model in multi GPUs and use those GPUs for inferences as well? Like, there is this model which can be loaded on a single GPU (default cuda:0) and run for inference as below: from transformers import…

huggingface multi-gpu accelerate inference-engine deepspeed

asked Aug 13 '23 at 18:33

NeuralAI

43
2
10

0

votes

0 answers

Why does the DeepSpeed `estimate_zero2_model_states_mem_needs_…` API report the same memory per CPU with different `offload_optimizer` option values?

The example provided in Memory Requirements - DeepSpeed 0.10.1 documentation is as follows: python -c 'from deepspeed.runtime.zero.stage_1_and_2 import estimate_zero2_model_states_mem_needs_all_cold;…

gpu cpu deepspeed

asked Jul 29 '23 at 14:10

Shawn Yuxuan Tong

1
1

0

votes

0 answers

How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning?

I'm trying to enable activation checkpointing for a T5-3b model to significantly free up GPU memory. However, it's not quite clear how to do the implementation for an LLM. Based on the PTL docs, it's something like this: from lightning.pytorch…

python pytorch pytorch-lightning deepspeed fine-tuning

asked Jul 06 '23 at 18:32

Riley Hun

2,541
5
31
77

Questions tagged [deepspeed]