I've got a workflow that looks like this:
for i in some_list:
if i not in os.listdir(a_directory):
x = do_something(i)
x.to_pickle(f"{a_directory}/{i}")
The os.listdir
is expensive, because the directory is huge, and because it's over a network file system.
I have multiple workers doing this job, so I can't just list the contents of the directory once. If I do, then my workers will duplicate their work, and do_something
is more expensive after all than os.listdir
.
Is there something that looks for the presence of a specific file, rather than dumping all of them into a python list for me to string match on?