I get a list and want to know if all elements are identical.
For lists with a high number of elements, that are indeed all identical, the conversion into a set
is fast, but otherwise going over the list with early exit is performing better:
def are_all_identical_iterate(dataset):
first = dataset[0]
for data in dataset:
if data != first:
return False
return True
# versus
def are_all_identical_all(dataset):
return all(data == dataset[0] for data in dataset)
# or
def are_all_identical_all2(dataset):
return all(data == dataset[0] for data in iter(dataset))
# or
def are_all_identical_all3(dataset):
iterator = iter(dataset)
first = next(iterator)
return all(first == rest for rest in iterator)
NUM_ELEMENTS = 50000
testDataset = [1337] * NUM_ELEMENTS # all identical
from timeit import timeit
print(timeit("are_all_identical_iterate(testDataset)", setup="from __main__ import are_all_identical_iterate, testDataset", number=1000))
print(timeit("are_all_identical_all(testDataset)", setup="from __main__ import are_all_identical_all, testDataset", number=1000))
My results: 0.94 seconds, 3.09 seconds, 3.27 seconds, 2.61 seconds
The for loop takes less than have the time (3x) of the all
function. The all
function is supposed to be the same implementation.
What is going on? I want to know why the loop is so much faster and why there is a difference between the last 3 implementations. The last implementation should have one compare less, because the iterator removes the first element, but that shouldn't have this kind of impact.