3

I have been dealing with very small data stores where it really did not matter how wasteful I was with parsing thru the data. I recently started work on a data store with records in the 100,000s, and am taking a look into optimizing my algorithms. I just reduced my time by a multiple of several hundred, and was trying to benchmark a few other solutions. I have a question about terminology:

Is there a well-defined way, like Big O notation, to say "This algorithm takes half the time as that algorithm" ?

Big O notation is a well-understood, cross-platform way to define the time complexity of an algorithm by saying things like, a binary search on an ordered table takes O(log n) time, whereas a search on an unordered table takes O(n) time. Adrian Mejia: Big O cheatsheet and examples

Big O notation (and the definition of time complexity) is about growth rates. Some algorithm that takes 2n, n, and n/2 all grow at a linear rate and are expressed by O(n). Thus, we drop the constant preceding the 'n' when we use Big O notation and accept that it it really only useful for comparing algorithms that take O(1), O(log n), O(n), O(n^2) and other exponents. StackOverflow: Why is constant always dropped from big O analysis?

The best reason I have found for this is because these constants are implementation dependent. If my WindowsXP computer from 2002 and your Windows10 computer from 2019 do that same task, it may take WindowsXP 2n time to do what your computer does in n/2 time.

Part of the optimizations that I have recently done are to the effect of the following: I have an algorithm in my software that goes thru a list of say 100,000 data points to get max and min values. I used to iterate thru the entire list to find the max, and then iterate thru the entire list to find the min, in two different functions that were miles apart. I now iterate thru it once to find the max and the min values, and then pass around the two values until I need them. If we assume iterating thru a list is done in n time, then I used to use 2n time to iterate thru the list twice, as opposed to now doing this in n time to iterate thru the list once. It will not matter what hardware you use, 18 year old computer or brand new one. The new algorithm is done in half the time.

int minValue = int.MaxValue;
int maxValue = int.MinValue;
foreach(int entry in myList)
{
    if (entry < minValue) minValue = entry;
    if (entry > maxValue) maxValue = entry;
}

(If you notice it is C# / .NET and say, use LINQ instead to speed up the algorithm, you have clearly missed the point of the question)

I have not been able to find a concise, well-understood way to say this like Big O notation. Big O, Big Omega, Big Theta, little o notation all have to do with time complexity. Thus, all are only concerned with growth rates and drop any constant in from of n.

One way I thought of was to benchmark my two implementations of the algorithm to say, for 10,000 points on a Windows10 production machine, algorithm A took 15 seconds and algorithm B took 7.5 seconds. But I do not care about the timestamps, just that algorithm B runs in half the time.

I could also do away Big O notation and just say, Algorithm B requires one iteration thru the data to do the same job as algorithm A, which requires two iterations. This works, but does not use well-understood terminology. I would think well-understood terminology would be useful in white papers, where you are trying to state that your algorithm runs in 1/100 of the time of another algorithm. This need for terminology is why, I assume, people came up with Big O notation in the first place!

So, is there well-defined terminology? Or is this question silly, and anyone who wonders it should just get on with their lives?

John Doe
  • 165
  • 1
  • 4
  • 15
  • 2
    (Not the point of the question, but...) *"The new algorithm is done in half the time."* - I doubt that. It's one iteration instead of two, but that one iteration does more work than each of the two simpler ones. Did you actually measure that and 15 seconds and 7.5 seconds where the actual times? – Stefan Pochmann Mar 04 '20 at 17:01
  • They were not actual times, no. I do see your point, though. – John Doe Mar 05 '20 at 02:57

3 Answers3

4

It is possible to do this without inventing a new notation. For example, here's how Wikipedia compares the number of comparisons done by bottom-up heapsort vs. ordinary heapsort (emphasis mine):

While ordinary heapsort requires 2n log2 n + O(n) comparisons worst-case and on average, the bottom-up variant requires n log2 n + O(1) comparisons on average, and 1.5n log2 n + O(n) in the worst case.

That is, for large n, ordinary heapsort does twice as many comparisons as bottom-up heapsort in the average case. This is a slight abuse of notation because it's adding a function like n log2 n to an asymptotic term like O(1) which really represents a set of functions, but it's understood as "n log2 n plus some function in O(1)".

In the general case, we don't necessarily know what the next asymptotically smaller term should be, so instead of writing 1.5n log2 n + O(n), the weaker bound 1.5n log2 n + o(n log n) can be written using little o notation.

Note that this makes sense when we talk about the number of operations (e.g. comparisons or swaps) done by an algorithm, but static analysis cannot be used to give a non-asymptotic formula for the actual running time, because the actual running time still depends on the time it takes to perform basic operations (e.g. reading/writing memory, adding numbers), so the running time differs from the number of operations by an unknown constant factor. So, one reason for ignoring constant factors is so we can talk about running time rather than just the number of operations.

kaya3
  • 47,440
  • 4
  • 68
  • 97
1

Big O, Big Omega or Big Theta notations help us reason about classes of problems and solutions. Once you have found 2 solutions to the problem within the same class, then looking at the constants is definitely desired in analysis and comparison.

Big O notations also differ for best and worst case scenarios, so further judgement and details are definitely not being looked over and bringing constants and other caveats back to the picture is not unheard of.

Therefore there is definitely value in talking about O(n) class solutions and then comparing 2 * n vs n algorithms.

diginoise
  • 7,352
  • 2
  • 31
  • 39
0

Why don't you write T2 / T1 = 2 ?

  • 1
    Perhaps because that uses no asymptotic notation, so it asserts that the ratio is exactly 2, for all values of n. – kaya3 Mar 04 '20 at 16:16
  • 1
    @kaya3: running the algorithm twice gives a ratio 2, exactly and for all N (in practice ±50% :-) –  Mar 04 '20 at 16:36