I have an if statement that checks to see if a directory already exists:
if not os.path.exists(os.path.dirname(save_to)):
os.makedirs(os.path.dirname(save_to))
After this point, files are added to the directory save_to
, whether or not it previously existed.
SOMETIMES, code inside the if
statement is executed even if the directory already exists. It's totally random.
I BELIEVE that this is occurring because I'm using multiprocessing.Pool.map
to assign this task to several CPUs. I think process 1 AND 2 get inside the if
statement. I think process 1 then creates the directory, and then process 2 tries and fails.
This is the error I'm getting:
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/WNeill/anaconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/home/WNeill/anaconda3/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
return list(itertools.starmap(args[0], args[1]))
File "/home/WNeill/who-said-what/wsw/preprocessing.py", line 147, in clip_audio ****!!!****
os.makedirs(os.path.dirname(save_to)) ****!!!****
File "/home/WNeill/anaconda3/lib/python3.8/os.py", line 223, in makedirs
mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/home/WNeill/clipped/clipped/aldfly'
I can't think of any other reason for line 147, which corresponds to the code snippet above (also marked in the stack trace) to execute.
Question:
How can I combat this issue (regardless if my hypothesis is correct)?
Proposed Solution:
My only thought is to maybe use the argument exist_ok=True
and get rid of the if
statement. I'm afraid of overwriting work if I use this approach though. I've got about 8 hours of processing to do, and I'd hate if something got deleted/overwritten.