Profiling Analysis
Solutions
Which solution is the fastest? There are two clear favorite answers (and 3 solutions) that captured most of the votes.
- The solution by Patrick Artner - denoted as PA.
- The first solution by jpp - denoted as jpp1
- The second solution by jpp - denoted as jpp2
This is because these claim to run in O(N) while others here run in O(N^2), or do not guarantee the order of the returned list.
Experiment setup
For this experiment 3 variables were considered.
- N elements. The number of first N elements the function is searching for.
- List length. The longer the list the further the algorithm has to look to find the last element.
- Repeat limit. How many times an element can repeat before the next element occurs in the list. This is uniformly distributed between 1 and the repeat limit.
The assumptions for data generation were as follows. How strict these are depend on the algorithm used, but is more a note on how the data was generated than a limitation on the algorithms themselves.
- The elements never occur again after its repeated sequence first appears in the list.
- The elements are numeric and increasing.
- The elements are of type int.
So in a list of [1,1,1,2,2,3,4 ....] 1,2,3 would never appear again. The next element after 4 would be 5, but there could be a random number of 4s up to the repeat limit before we see 5.
A new dataset was created for each combination of variables and and re-generated 20 times. The python timeit
function was used to profile the algorithms 50 times on each dataset. The mean time of the 20x50=1000 runs (for each combination) were reported here. Since the algorithms are generators, their outputs were converted to a list to get the execution time.
Results
As is expected the more elements searched for, the longer it takes. This graph shows that the execution time is indeed O(N) as claimed by the authors (the straight line proves this).

Fig 1. Varying the first N elements searched for.
All three solutions do not consume additional computation time beyond that which is required. The below image shows what happens when the list is limited in size, and not N elements. Lists of length 10k, with elements repeating a maximum of 100 times (and thus on average repeating 50 times) would on average run out of unique elements by 200 (10000/50). If any of these graphs showed an increase in computation time beyond 200 this would be a cause for concern.

Fig 2. The effect of first N elements chosen > number of unique elements.
The figure below again shows that processing time increases (at a rate of O(N)) the more data the algorithm has to sift through. The rate of increase is the same as when first N elements were varied. This is because stepping through the list is the common execution block in both, and the execution block that ultimately decides how fast the algorithm is.

Fig 3. Varying the repeat limit.
Conclusion
The 2nd solution posted by jpp is the fastest solution of the 3 in all cases. The solution is only slightly faster than the solution posted by Patrick Artner, and is almost twice as fast as his first solution.