Questions tagged [huggingface-trainer]

33 questions
4
votes
1 answer

What is the official way to run a wandb sweep with hugging face (HF) transformers so that all the HF features work e.g. distributed training?

Intially I wanted to run a hugging face run such that if the user wanted to run a sweep they could (and merge them with the command line arguments given) or just execute the run with the arguments from command line. The merging is so that the train…
3
votes
3 answers

How to fix "Trainer: evaluation requires an eval_dataset" in Huggingface Transformers?

I’m trying to do a finetuning without an evaluation dataset. For that, I’m using the following code: training_args = TrainingArguments( output_dir=resume_from_checkpoint, evaluation_strategy="epoch", per_device_train_batch_size=1, ) def…
2
votes
0 answers

Can I add configuration of 'dropout_rate' to Seq2SeqTrainer?

I'am trying to train T5 model using Seq2SeqTrainer. I found out that the Config of T5 model is like below. T5Config { "_name_or_path": "allenai/tk-instruct-base-def-pos", "architectures": [ "T5ForConditionalGeneration" ], …
1
vote
1 answer

CUDA out of memory using trainer in huggingface during validation (training is fine)

When doing fine-tuning with Hg trainer, training is fine but it failed during validation. Even reducing the eval_accumation_steps = 1 did not work. I followed the procedure in the link: Why is evaluation set draining the memory in pytorch hugging…
Tommy
  • 19
  • 2
1
vote
1 answer

Validation and Training Loss when using HuggingFace

I do not seem to find an explanation on how the validation and training losses are calculated when we finetune a model using the huggingFace trainer. Does anyone know here to find this information?
1
vote
0 answers

Invalid key: 409862 is out of bounds for size 0

How I can fix this: I writed code for training GPT-2 on dataset by Hugging Face, but I have an error and don't know why I got this error: --------------------------------------------------------------------------- IndexError …
1
vote
1 answer

index Error finetuning an Alpaca fine-tuned model

I’m relatively new to Hugging Face, and I’m facing an error I’m not able to debug when trying to Fine-tune a Vigogne model on my own data. First of all some context: I’m running everything in a Jupyter Notebook on AWS SageMaker (Instance…
1
vote
1 answer

How to continue training with HuggingFace Trainer?

When training a model with Huggingface Trainer object, e.g. from https://www.kaggle.com/code/alvations/neural-plasticity-bert2bert-on-wmt14 from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments import os os.environ["WANDB_DISABLED"] =…
alvas
  • 115,346
  • 109
  • 446
  • 738
0
votes
0 answers

partial-ized forward method for a torch Model does not work well with multi-gpu jobs

I am trying to understand why re-assigning the forward method of a pytorch model object leads to the following error under multi-gpu prediction job (configured automatically by huggingface trainer) RuntimeError: Expected all tensors to be on the…
John Jiang
  • 827
  • 1
  • 9
  • 19
0
votes
0 answers

Llama+LoRA: training loss straight down to 0 on full dataset (~14k) but ok on sample data (10 samples)

I am trying to fine-tune the LlaMA model with Low-Rank Adaptation (LoRA) based on HuggingFace. When I train the model on full dataset (~14k), the training loss down to 0 and keep 0 from epoch 2.train loss - full eval loss - full But the loss trend…
0
votes
0 answers

After training the model using SFT, how do I load the model?

I have trained the model with the following code. from datasets import load_dataset from trl import SFTTrainer from transformers import AutoModel, DataCollatorForLanguageModeling, AutoTokenizer, TrainingArguments from peft import LoraConfig #…
0
votes
1 answer

Fine tuning multiclass multilabel wav2vec2 model with transformers

I have managed to adapt the HuggingFace audio classification tutorial to my own dataset: https://github.com/mirix/messaih/blob/main/charts/fine_tune_w2v.py I can now fine-tune a wav2vec model on my dataset. I am currently fine tuning a classifier on…
0
votes
0 answers

How to load LoRA weights saved locally?

I am currently training a model and have saved the checkpoints for the LoRA adapters. I now have the .bin and .config file for the adapters. How do I reload everything for inference without pushing to huggingFace? Most of the documentation talks…
0
votes
0 answers

How to use the CER on the validation data when training the model using the Trainer API?

I am using Huggingface Trainer API to fine-tune an ASR model, e.g. https://huggingface.co/openai/whisper-tiny During a Callback function, I call evaluate API to calculate CER metric. {{code-snippet-needed}} # i.e. What have you…
0
votes
0 answers

example transformers how to implement a custom trainer

Using pythorch and transformers library I am trying to user bert-base-cased for a regression task. This is how I implement the dataset class CustomDataset(Dataset): def __init__(self, data, maxlen, tokenizer, target_cols): self.df =…
JayJona
  • 469
  • 1
  • 16
  • 41
1
2 3