I am training transformers model on differtnt GPUs(3 gpus out of 8) and want to kill training on spesfic gpus only (0,6,7)
I trained top
command I can see only PID
. But don't know which GPUs belong to
PID
THE kill -9 I do not want to use because don't know which GPU will stop as I want to stop (0,7,6) and keep the others running
I reproduce the problem with a small example :
from accelerate import Accelerator, notebook_launcher
from accelerate.utils import set_seed
def training_loop():
set_seed(42)
accelerator = Accelerator(mixed_precision="fp16")
print("Hello There!")
# main()
notebook_launcher(training_loop(), num_processes=2) #training_loop(),
lunching the script with termonal :
CUDA_VISIBLE_DEVICES=0,6,7
python3 AccelerateTrainer.py
I expect after running Nvidia-smi
0%
for both 0,6, and 7 GPUs