I am wondering if Vertex AI Training can be used for distributed training using Huggingface Trainer and deepspeed? All I have seen are examples with the native torch distribution strategy.
It would be very helpful if someone can tell me
- If deepspeed is supported
- How to integrate deepspeed when doing multi-node training in Vertex AI