2

I'am trying to train T5 model using Seq2SeqTrainer. I found out that the Config of T5 model is like below.

T5Config {
  "_name_or_path": "allenai/tk-instruct-base-def-pos",
  "architectures": [
    "T5ForConditionalGeneration"
  ],
  "attention_probs_dropout_prob": 0.5,
  "d_ff": 2048,
  "d_kv": 64,
  "d_model": 768,
  "decoder_start_token_id": 0,
  "dense_act_fn": "gelu_new",
  "dropout_rate": 0.1,
  "eos_token_id": 1,
  "feed_forward_proj": "gated-gelu",
  "hidden_dropout_prob": 0.5,
  "initializer_factor": 1.0,
  "is_encoder_decoder": true,
  "is_gated_act": true,
  "layer_norm_epsilon": 1e-06,
  "model_type": "t5",
  "num_decoder_layers": 12,
  "num_heads": 12,
  "num_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "relative_attention_max_distance": 128,
  "relative_attention_num_buckets": 32,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.28.0",
  "use_cache": true,
  "vocab_size": 32100
}

But there is no configuration 'dropout_rate' in the Seq2SeqTrainer. I wrote codes of training T5Generator like below.

def train(self, tokenized_datasets, **kwargs):
        """
        Train the generative model.
        """
        #Set training arguments
        args = Seq2SeqTrainingArguments(
            **kwargs
        )

        # Define trainer object
        trainer = Seq2SeqTrainer(
            self.model,
            args, ...)

Is there a way to change configuration about dropout? Or could you tell me some configurations that can make similar effects?

Here's what I've tried:


training_args = {
  'output_dir':model_out_path,
  'evaluation_strategy':"epoch",
  'learning_rate':5e-5,
  'lr_scheduler_type':'cosine',
  'per_device_train_batch_size':8,
  'per_device_eval_batch_size':16,
  'num_train_epochs':4,
  'weight_decay':0.01,
  'warmup_ratio':0.1,
  'save_strategy':'no',
  'load_best_model_at_end':False,
  'push_to_hub':False,
  'eval_accumulation_steps':1,
  'predict_with_generate':True,
  'use_mps_device':use_mps_,
  "dropout_rate": 0.3
  }

And I also tried to use this solution, Transformers pretrained model with dropout setting but, I couldn't understand where I should attach that codes...

cronoik
  • 15,434
  • 3
  • 40
  • 78
hyewwns
  • 21
  • 4
  • The dropout is a model parameter and not a trainer parameter. Why don't you change it on the model level? – cronoik May 23 '23 at 18:51
  • 1
    Oh I was making new framework using T5 model. I think I misunderstood difference of model and trainer. Thank you for your comment! – hyewwns May 24 '23 at 16:23

0 Answers0