I'd like to finetune Starcoder (https://huggingface.co/bigcode/starcoder) on my dataset and on a GCP VM instance.
It's says in the documentation that for training the model, they used 512 Tesla A100 GPUs and it took 24 days.
I also saw the model (.bin) files in files section of huggingFace (https://huggingface.co/bigcode/starcoder/tree/main)
The total size of the model is ~64GB
Based on all this information,
- How do I decide which GPU is best for finetuning on my dataset ?
- How to estimate the time it will take finetune ? (based on assumptions on parameters like epoch=1, for instance)
- Are there any other factors that are considered to choose hardware / calculate time ?