I have been dealing with very small data stores where it really did not matter how wasteful I was with parsing thru the data. I recently started work on a data store with records in the 100,000s, and am taking a look into optimizing my algorithms. I just reduced my time by a multiple of several hundred, and was trying to benchmark a few other solutions. I have a question about terminology:
Is there a well-defined way, like Big O notation, to say "This algorithm takes half the time as that algorithm" ?
Big O notation
is a well-understood, cross-platform way to define the time complexity of an algorithm by saying things like, a binary search on an ordered table takes O(log n) time, whereas a search on an unordered table takes O(n) time. Adrian Mejia: Big O cheatsheet and examples
Big O notation (and the definition of time complexity) is about growth rates. Some algorithm that takes 2n, n, and n/2 all grow at a linear rate and are expressed by O(n)
. Thus, we drop the constant preceding the 'n' when we use Big O notation and accept that it it really only useful for comparing algorithms that take O(1)
, O(log n)
, O(n)
, O(n^2)
and other exponents. StackOverflow: Why is constant always dropped from big O analysis?
The best reason I have found for this is because these constants are implementation dependent. If my WindowsXP computer from 2002 and your Windows10 computer from 2019 do that same task, it may take WindowsXP 2n time to do what your computer does in n/2 time.
Part of the optimizations that I have recently done are to the effect of the following: I have an algorithm in my software that goes thru a list of say 100,000 data points to get max and min values. I used to iterate thru the entire list to find the max, and then iterate thru the entire list to find the min, in two different functions that were miles apart. I now iterate thru it once to find the max and the min values, and then pass around the two values until I need them. If we assume iterating thru a list is done in n time, then I used to use 2n time to iterate thru the list twice, as opposed to now doing this in n time to iterate thru the list once. It will not matter what hardware you use, 18 year old computer or brand new one. The new algorithm is done in half the time.
int minValue = int.MaxValue;
int maxValue = int.MinValue;
foreach(int entry in myList)
{
if (entry < minValue) minValue = entry;
if (entry > maxValue) maxValue = entry;
}
(If you notice it is C# / .NET and say, use LINQ instead to speed up the algorithm, you have clearly missed the point of the question)
I have not been able to find a concise, well-understood way to say this like Big O notation. Big O, Big Omega, Big Theta, little o notation all have to do with time complexity. Thus, all are only concerned with growth rates and drop any constant in from of n.
One way I thought of was to benchmark my two implementations of the algorithm to say, for 10,000 points on a Windows10 production machine, algorithm A took 15 seconds and algorithm B took 7.5 seconds. But I do not care about the timestamps, just that algorithm B runs in half the time.
I could also do away Big O notation and just say, Algorithm B requires one iteration thru the data to do the same job as algorithm A, which requires two iterations. This works, but does not use well-understood terminology. I would think well-understood terminology would be useful in white papers, where you are trying to state that your algorithm runs in 1/100 of the time of another algorithm. This need for terminology is why, I assume, people came up with Big O notation in the first place!
So, is there well-defined terminology? Or is this question silly, and anyone who wonders it should just get on with their lives?