The following function was used to check if elements of list A(files_list) are in list B(reference_list). However, the reference_list consisted five million entry, and checking the code take too much time.
def check_files_in_list(file_list, reference_list):
not_found_files = []
for file_path in file_list:
if file_path not in reference_list:
not_found_files.append(file_path)
return not_found_files
Is there any way to accelerate this process? A different data structure, etc.? I tried some simple parallelization, but it did not accelerate the process. Had python already implemented the parallelization if file_path not in reference_list
by default?