Questions about optimizations and what is "fastest" have a lot of variables, so you should always profile it with your data under the same conditions you expect the code to operate under. Consider the following code:
import timeit
import random
import functools
def remove_duplicates_manual(input_list):
result = []
k = -1 # to store the index of current element being checked
for i in input_list:
k += 1
flag = False
for j in range(0,k,+1):
if(input_list[j] == i):
flag = True
if(flag==False):
result.append(i)
else:
continue
return result
def remove_duplicates_set(input_list):
return list(set(input_list))
l = [random.randint(0, 10) for x in xrange(1000)]
rd_manual = functools.partial(remove_duplicates_manual, l)
rd_set = functools.partial(remove_duplicates_set, l)
print(timeit.timeit(rd_manual, number=100))
print(timeit.timeit(rd_set, number=100))
This code produces this output:
3.648878
0.001779
So we can conclude that the set
method is generally going to be faster given a list of random integers in the range 0-10, but it may vary for your data and on your system. Also, as mentioned in the comments and other answers, there are also a lot of ways you could tweak your manual algorithm to make it faster.
Additionally, you should note that the set
method may not even be possible for some data. Python sets require hashable data types (just like dictionary keys), so if you're sorting a list of mutable data types (like other lists), you won't be able to use list(set(input_list))
.