3
File exists: '/opt/ml/input/data/log_dir/story-visualization-0519-v2-768x480-20x12-lr5e-05'    
Traceback (most recent call last):
      File "/opt/ml/code/deepspeed_tools/abstract_trainer_deepspeed.py", line 249, in <module>
        model_engine.save_checkpoint(config.model_dir, epoch, client_state=client_sd)
      File "/opt/conda/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 2717, in save_checkpoint
        os.makedirs(save_dir, exist_ok=True)
      File "/opt/conda/lib/python3.8/os.py", line 223, in makedirs
        mkdir(name, mode)

I thought if setting exist_ok=True for os.makedirs, it will never raise file exists exception, but still get this error? Any suggestions: os.makedirs(save_dir, exist_ok=True)

Not sure because of my code is running in multi- processors

Hypnoz
  • 1,115
  • 4
  • 15
  • 27
  • I suspect that this means that the directory name you're trying to create already exists, *as a normal file*. – jasonharper May 21 '22 at 02:36
  • @jasonharper but I checked my directory, it is a folder, no normal file with same name there (even no logic). do you think it is related to multi processors – Hypnoz May 21 '22 at 03:38
  • the error happens randomly (around 30%), most of time is okay. If I restart my task, it works fine. but I need to run it multiple times. – Hypnoz May 21 '22 at 03:38
  • look at here < https://stackoverflow.com/questions/273192/how-can-i-safely-create-a-nested-directory> – Сергей Кох May 21 '22 at 05:10

0 Answers0