-2

I have an if statement that checks to see if a directory already exists:

if not os.path.exists(os.path.dirname(save_to)):
     os.makedirs(os.path.dirname(save_to))

After this point, files are added to the directory save_to, whether or not it previously existed.

SOMETIMES, code inside the if statement is executed even if the directory already exists. It's totally random.

I BELIEVE that this is occurring because I'm using multiprocessing.Pool.map to assign this task to several CPUs. I think process 1 AND 2 get inside the if statement. I think process 1 then creates the directory, and then process 2 tries and fails.

This is the error I'm getting:

multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/WNeill/anaconda3/lib/python3.8/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/WNeill/anaconda3/lib/python3.8/multiprocessing/pool.py", line 51, in starmapstar
    return list(itertools.starmap(args[0], args[1]))
  File "/home/WNeill/who-said-what/wsw/preprocessing.py", line 147, in clip_audio ****!!!****
    os.makedirs(os.path.dirname(save_to))                                         ****!!!****
  File "/home/WNeill/anaconda3/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: '/home/WNeill/clipped/clipped/aldfly'

I can't think of any other reason for line 147, which corresponds to the code snippet above (also marked in the stack trace) to execute.

Question:

How can I combat this issue (regardless if my hypothesis is correct)?

Proposed Solution:

My only thought is to maybe use the argument exist_ok=True and get rid of the if statement. I'm afraid of overwriting work if I use this approach though. I've got about 8 hours of processing to do, and I'd hate if something got deleted/overwritten.

rocksNwaves
  • 5,331
  • 4
  • 38
  • 77
  • 2
    Gibbs' Rule #18: Don't ask before, create the directory and handle the exception if one is raised. – Klaus D. Aug 20 '20 at 18:00
  • @KlausD. Better to ask forgiveness than permission. I read that in the PEP(?) coding styles guide just last week. I shoulda thought of that (but I'll give myself a pass since I'm still learning). Thanks for the reminder! – rocksNwaves Aug 20 '20 at 18:02

1 Answers1

1

A bit hefty solution is to refer to this post Python sharing a lock between processes.

You could use the Manager and Lock mentioned there to create a critical section in that part of the code. In other words this will cause the first thread that gets there to prevent other threads from executing that part of the code, only after the lock is released can they continue on their merry way.

Grinjero
  • 436
  • 2
  • 7
  • Thank you :) I chose to go with using a try-except block, but I like your answer. I'm going to select yours as the solution, because it works and because no one else wants to try. Cheers! – rocksNwaves Aug 21 '20 at 00:17