0

I am using the following code snippet to parallelize a process (creation of embeddings for a neural network):

#...

num_processes =   max(1, cpu_count() // 2)  # Limit the number of processes to half of available CPU cores or minimum 1

def process_mol(index):
    # Get molecule from the original list using the index
    obmol = obmol_list[index]

    result = mol2vec(obmol)

    return result


if __name__ == '__main__':

    print("computing embeddings...")

    # Create a pool of worker processes
    pool = Pool(processes=num_processes) 

    # Create a list of indices corresponding to the positions of the molecules in obmol_list
    indices = range(len(obmol_list))

    # Parallelize the mol2vec function call across the OBmols in 'obmol_list'
    data_list = pool.map(process_mol, indices)

    # Close the pool to free up resources
    pool.close()
    pool.join()

    # pickl data
    with open('pickled_data/data_list.pkl', 'wb') as f:
        pickle.dump(data_list, f) 
#...

However I am getting the following error:

multiprocessing.pool.MaybeEncodingError: Error sending result: '[Data(x=[16, 396],... Reason: 'OSError(24, 'Too many open files')'

I am already using only half of the available CPU cores, but my data set (1.6 million elements) is rather large. Any ideas on how to solve this problem?

Limmi
  • 111
  • 9
  • [This question](https://stackoverflow.com/questions/45665991/multiprocessing-returns-too-many-open-files-but-using-with-as-fixes-it-wh) suggests using `with multiprocessing.Pool(nprocess) as pool:` can fix this problem. – Nick ODell Aug 07 '23 at 16:28
  • What does *mol2vec()* do? Also (not related to the issue at hand) why are you even writing *process_mol()* when you could just write *pool.map(mol2vec, obmol_list)*? Are you sure that the objects returned by *mol2vec()* can be pickled? Do they contain open file descriptors? – DarkKnight Aug 07 '23 at 16:33

1 Answers1

0

Here are some possible reasons:

  • other processes are opening files and not closing them properly using up a lot of file handles.
  • if mol2vec opens files and doesn't use a "with" then it might result in too many files being open.
  • while debugging you might have created a lot of zombie processes that still have open file handles - these need to be killed.

Try restarting the machine or looking for and killing suspicious processes. Also make sure to always use a "with" clause elsewhere in your code.

Andrew Louw
  • 679
  • 5
  • 10