I am downloading files from a server ,but ignore the files which are already downloaded ,so for that purpose I have saved the names of downloaded files in a DB ,when the application starts this list of file names from DB is stored in a set as shown below. Then I run find command on the server and get the list of files to be downloaded and comparison is made as below:
file_list_for_delta = set() #retrieved from DB
for file_name in file_list: #retrieved this list from server using find command
if file_name in str(file_list_for_delta):
return True
The problem is that this comparison of each file name takes huge time ,there are at least 500000 records in DB and around 20000 file names that are retrieved from server every time the script run.
what is the most efficient way / Data structure i can use in place of set() .