total_files_set = set(total_files)
files_downloaded_set = set(files_downloaded)
files_not_dowloaded_set = total_files_set - files_downloaded_set
list_of_files_not_dowloaded = list(files_not_dowloaded_set)
Or if you want in one line:
list_of_files_not_dowloaded = list(set(total_files) - set(files_downloaded))
To know more about all operations using sets, you can check it here
EDIT:
I've tried timing both methods, using 2 random lists
- For subset with 50,000 elements and superset with 100,000 elements
timeit.timeit('l = list(set(l1)-set(l2))',
setup='import random; l1 = random.sample(range(1000000), 100000); l2 = random.sample(range(1000000), 50000)',
number = 10)
Output:
0.39393879500130424
timeit.timeit('l = [item for item in l1 if item not in l2]', \
setup='import random; l1 = random.sample(range(1000000), 10000); l2 = random.sample(range(1000000), 5000)', \
number = 1)
Output:
98.58012624000003
If you happen to already have both sets, instead of having to convert from list:
timeit.timeit('l = list(s2-s1)',
setup='import random; s1 = set(random.sample(range(1000000), 100000)); s2 = set(random.sample(range(1000000), 50000))',
number = 10)
Output:
0.06160322100004123