5

I'd like to finetune Starcoder (https://huggingface.co/bigcode/starcoder) on my dataset and on a GCP VM instance.

It's says in the documentation that for training the model, they used 512 Tesla A100 GPUs and it took 24 days.

I also saw the model (.bin) files in files section of huggingFace (https://huggingface.co/bigcode/starcoder/tree/main)

The total size of the model is ~64GB

Based on all this information,

  1. How do I decide which GPU is best for finetuning on my dataset ?
  2. How to estimate the time it will take finetune ? (based on assumptions on parameters like epoch=1, for instance)
  3. Are there any other factors that are considered to choose hardware / calculate time ?
cronoik
  • 15,434
  • 3
  • 40
  • 78
Aadesh
  • 403
  • 3
  • 13

0 Answers0