-4

I am downloading files from a server ,but ignore the files which are already downloaded ,so for that purpose I have saved the names of downloaded files in a DB ,when the application starts this list of file names from DB is stored in a set as shown below. Then I run find command on the server and get the list of files to be downloaded and comparison is made as below:

file_list_for_delta = set() #retrieved from DB

for file_name in file_list: #retrieved this list from server using find command
    if file_name in str(file_list_for_delta):
        return True

The problem is that this comparison of each file name takes huge time ,there are at least 500000 records in DB and around 20000 file names that are retrieved from server every time the script run.

what is the most efficient way / Data structure i can use in place of set() .

holdenweb
  • 33,305
  • 7
  • 57
  • 77
Sharma
  • 17
  • 4
  • 2
    Do the comparison on the server side, most database management systems are explicitely optimized for this. – Jan Jul 25 '20 at 07:58
  • 8
    Why on Earth are you converting the `set` to a string before checking for membership? You already _had_ the correct structure to check for members in O(n) time and then you throw it away by converting to a string _on every iteration_. Set membership is O(1) and you're looking through the results, making it O(n). I'm not even sure of the complexity of your current approach when the string conversion is thrown in – roganjosh Jul 25 '20 at 07:58
  • https://wiki.python.org/moin/TimeComplexity – hjpotter92 Jul 25 '20 at 08:00
  • `set` is the fastest. – Andrej Jul 25 '20 at 08:24
  • 2
    Your current solution only works by accident! `str(your_set)` is a single string meant to be read, not to be searched. The point of a set is to look up items in it directly: `if obj in your_set`. – Andras Deak -- Слава Україні Jul 25 '20 at 09:03
  • You might want to read this: https://stackoverflow.com/questions/54492158/how-to-efficiently-search-for-a-list-of-strings-in-another-list-of-strings-using/63071541#63071541 – zabop Jul 25 '20 at 09:44

1 Answers1

1

Your code is almost correct, but testing for set membership is nowhere near as complicated as you made it (and your version only "works by accident"). See if this works faster - it doesn't convert the set to a string each time it searches for a file name!

file_set = set() #retrieved from DB

for file_name in file_list: #retrieved this list from server using find command
    if file_name in file_set:
        return True

You will also note I have renamed the binding for the set, since it's very misleading to have a set variable whose name contains the word "list" ;-).

You don't show us how you populate the set from the database, so I assumed the integrity of that process was not in question.

holdenweb
  • 33,305
  • 7
  • 57
  • 77