66

Let's say we have a problem where a certain algorithm, let's call it algorithm_1, solves it in time complexity of O(n^2) and another algorithm, let's call it algorithm_2, solves it in time complexity O(n), but in reality we see that for n < 1000 algorithm_1 is faster and otherwise algorithm_2 is faster.

Why can't we just write code like this:

if ( n < 1000)
  do algorithm_1
else
  do algorithm_2

Is this a real thing programmers do or are there downsides for this?

On a smaller program this seems to be a good idea.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ak2399
  • 827
  • 3
  • 6
  • 24
    Off the top of my head, for example, on most implementation, `qsort` does this with insertion sort and quicksort. Timsort is a mix, too. – Neil Jan 12 '23 at 22:20
  • 1
    @Neil I just put the exact same thing about timsort in an answer, the coincidences of life. – Caridorc Jan 12 '23 at 22:22
  • 8
    Yes this is a valid option if you have enough data to support your decision. But of course there are also downsides: Twice the amount of code is also twice the amount of code to maintain and twice as much possibility to make mistakes ... – derpirscher Jan 12 '23 at 22:25
  • 16
    Standard libraries in many programming languages are **full** of if-else conditions exactly like the one you describe. – Stef Jan 12 '23 at 22:31
  • 4
    Often for small n the difference isn't enough to be concerned over, even if the complexity for large n is significant. Recursive algorithms might be an exception. – Mark Ransom Jan 13 '23 at 01:32
  • 6
    I once used a bubble sort in production code, because I knew it was quick enough for our typical case of n=2 or n=4. The next version of the program added a new feature that completely invalidated that typical case assumption, and I'm sure I was cursed many times by the person who inherited that code. – Mark Ransom Jan 13 '23 at 01:38
  • 1
    When you said "high time complexity", I thought you meant exponential time (O(n!), O(2^n), etc.). O(n^2) isn't fast, but it also isn't particularly slow in the space of all possible algorithms. – NotThatGuy Jan 13 '23 at 10:45
  • 6
    "Why cant we just write code like this" – It's hard to answer this question without knowing why you think that we can't write code like this, especially since you prove *in the very next line* that we *can* write code like this. – Jörg W Mittag Jan 13 '23 at 11:02
  • 2
    Not quite the same, but: [SAT solvers](https://en.wikipedia.org/wiki/SAT_solver) are used for a lot of problems, even though the SAT problem is known to have NP complexity. Still, they are efficient enough on a wide class of instances where you'd otherwise have to switch to a special-cased routine. – phipsgabler Jan 13 '23 at 12:20
  • @NotThatGuy well, what's “fast” depends a lot on the application. In some cases, it means “must be linear time” (perhaps amortized, or at most something like _n_ · log _n_). In others, even an exponential algorithm might be considered fast, if no polynomial one exists and all alternatives have bigger overhead. – leftaroundabout Jan 13 '23 at 14:11
  • 1
    One place you see this a lot is in SQL query planners. They keep statistics on the various tables in the database and estimate which techniques will be fastest for a particular query. – 9072997 Jan 13 '23 at 16:01
  • 2
    "Is this a real thing programmers do or are there down sides for this?" It's both. Dada's answer about hash table implementation gives an example where a data structure swaps out implementations on the fly based on size. As for downsides they're obvious: more code == more bugs. More complexity == higher cognitive load for reading/modifying the code. The need of testing to set the correct threshold for the algorithm change, and re-testing every few years to see if it's still worth doing and if so at what size threshold. Etc. etc. – Jared Smith Jan 14 '23 at 16:10
  • In some applications in computational chemistry you have to perform a [graph isomorphism problem](https://en.wikipedia.org/wiki/Graph_isomorphism_problem) to figure out if two molecules can react with each other. You have a lot of molecules to test but each one is represented by a relatively small graph, so the fact that graph isomorphism is expensive doesn't really matter too much. – N. Virgo Jan 15 '23 at 04:33

9 Answers9

74

This does happen in the real world! For example, a famous sorting algorithm is Timsort:

Timsort

Details of the below implementation:

We consider the size of the run as 32 and the input array is divided into sub-array.

We one-by-one sort pieces of size equal to run with a simple insertion sort. After sorting individual pieces, we merge them one by one with the merge sort.

We double the size of merged subarrays after every iteration.

Insertion sort has complexity O(N^2) but is faster for tiny lists, Merge Sort has complexity O(N logN) so it is better for longer lists.

Introsort

Another example is introsort, the sort used in the C++ standard library:

Introsort or introspective sort is a hybrid sorting algorithm that provides both fast average performance and (asymptotically) optimal worst-case performance. It begins with quicksort, it switches to heapsort when the recursion depth exceeds a level based on (the logarithm of) the number of elements being sorted and it switches to insertion sort when the number of elements is below some threshold. This combines the good parts of the three algorithms, with practical performance comparable to quicksort on typical data sets and worst-case O(n log n) runtime due to the heap sort. Since the three algorithms it uses are comparison sorts, it is also a comparison sort.

More complexity downside

The downside of using more algorithms for a single task is clearly increased complexity. It is worth it if you are writing standard library code for a programming language that will be re-used millions or even billions of times. For smaller projects focusing on saving developer time over machine time by implementing only one algorithm is often the better choice.

References:

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Caridorc
  • 6,222
  • 2
  • 31
  • 46
  • The downside of using more algorithms for a single task is clearly increased complexity. – Caridorc Jan 12 '23 at 22:28
  • 3
    Not algorithmic complexity though, if you did it right! – Mad Physicist Jan 14 '23 at 03:16
  • 2
    And presumably, the more code involved, the more likely it is to suffer from bad cache performance? – gidds Jan 14 '23 at 13:40
  • @gidds No. Where did you get that idea? – Passer By Jan 16 '23 at 07:14
  • @PasserBy: Instruction cache, not data cache. Large code taking up too much instruction cache can cause performance problems, which is one of the reasons we don't do things like aggressively unroll every loop in a program. – user2357112 Feb 24 '23 at 22:38
  • @user2357112 Amount of code doesn't translate into amount of instructions that's going to cause instruction cache problems well. – Passer By Feb 25 '23 at 04:46
34

Yes, this is common. For example, bignums multiplication can be done in several ways:

  • naive O(N^2)
  • Karatsuba O((N^1.585)
  • Schönhage–Strassen O(N*log(N)*(log(log(N))))
  • and there are also more advanced slightly faster algorithms now

So based on input variables used, the bitwidth fastest version is used (I use this for performance-critical "low-level" math operations on bignums all the time, because they are used as a building block for higher operations and without it it would not work with optimal speed on "whole/practical" number ranges).

However, the thresholds depends on computing hardware architecture, used compiler and sometimes even code usage, so they might differ on a per computer basis, so to ensure best performance, the thresholds are sometimes measured at program startup or configuration phase instead of using hardcoded values.

This is usually used on functions that has a huge variety of input sizes and also not on trivial functions, because the initial if statements that selects between algorithms is also performance hit (branch). In some "rare" cases I think you can do this branchlessly. For example, if the input size is also the input parameter of a template or maybe even a macro, for example, like in here (C++):

Gaussian elimination without result for acceleration

double matrix<1>::det() { return a[0][0]; }
double matrix<2>::det() { return (a[0][0]*a[1][1])-(a[0][1]*a[1][0]); }
template <DWORD N> double matrix<N>::det()
        {
        double d=0.0; int j;
        matrix<N-1> m;
        for (j=0;j<N;j++)
            {
            m=submatrix(0,j);
            if (int(j&1)==0) d+=a[0][j]*m.det();
             else            d-=a[0][j]*m.det();
            }
        return d;
        }

As you can see, there are three different methods for the determinant based on input size, but no branches for the selection. However, only hardcoded thresholds can be used.

You can also achieve this with function pointers, for example, (C++):

int algoritm1(int x){ return 10; }
int algoritm2(int x){ return 20; }
int algoritm3(int x){ return 30; }
int (*functions[4])(int x)={ algoritm1,algoritm2,algoritm3,algoritm3 };
int function(int x)
   {
   return functions[(x>>10)&3](x);
   }

So for x up to 1024 using algoritm1, up to 2048 using algorithm2 and for the rest, algorithm3 without any branches too. Here you can have dynamic thresholds (not that flexible, but still usable), so you can sample your x range by some power of 2 (like I did 2^10=1024) and just use duplicates. So, for example, if rounded thresholds are 3*1024 and 5*1024, you can do this:

int algoritm1(int x){ return 10; }
int algoritm2(int x){ return 20; }
int algoritm3(int x){ return 30; }
int (*functions[8])(int x)=
   {
   algoritm1,
   algoritm1,
   algoritm1,
   algoritm2,
   algoritm2,
   algoritm3,
   algoritm3,
   algoritm3
   };
int function(int x)
   {
   return functions[(x>>10)&7](x);
   }

So you can create the function pointers array based on measured threshold at runtime too with this approach...

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Spektre
  • 49,595
  • 11
  • 110
  • 380
  • Thank you for your contribution, can you please add a source for your claim (that different algorithms are implemented depending on the size of input in well-known bignum manipulation libraries)? – Caridorc Jan 13 '23 at 13:46
  • 1
    @Caridorc you can measure it easily as for example Schönhage-Strassen is very slow for inputs below ~100000 bits in comparison to Karatsuba. so its enough to measure something like `1e10 * 1e10` and `1e100000 * 1e100000` and check if the complexity is the same or not. You can use any big integer lib or computing SW. and measure the complexity as function of input size see: [measuring time complexity](https://stackoverflow.com/a/64784308/2521214) once you see that the complexity change after some input size then its clearly the case. – Spektre Jan 13 '23 at 15:27
  • 6
    You can also disect any decent BigInteger lib source code to confirm this like [GMP](https://gmplib.org) for example mine own libs does it all the time for cruicial functions which are used as building block for higher operations – Spektre Jan 13 '23 at 15:28
  • "*You can achieve this also with function pointers*" - I wouldn't call that approach branchless though – Bergi Jan 14 '23 at 22:42
  • @Bergi why not I see no conditional jump in there (unless I miss something obvious)... or is it that jump to indexed pointer array is considered as (pipeline/CACHE performance unfriendly) brunch too ? – Spektre Jan 15 '23 at 00:12
  • 3
    Yes. A computed jump is branching just like a conditional jump. – Bergi Jan 15 '23 at 02:27
  • You keep saying "brunch". It's "branch". – Kelly Bundy Feb 09 '23 at 06:35
19

As the other answer have said: yes!

Another example is Java's HashMap class. This class resolves collisions using separate chaining. Initially, each bucket contains a linked list, but if the length of this list growth past some threshold (8 in Java 8), it is converted into a TreeNode (a TreeNode is implemented as a red-black tree). Finding an item in a bucket through a linked list has a O(n) time complexity, while the TreeNode have a O(log n) time complexity.

Interestingly, the use of the linked list instead of the TreeNode is not (mainly) to save time, but rather to save space. A comment in the source code says:

Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use

Dada
  • 6,313
  • 7
  • 24
  • 43
17

Yes, using multiple algorithms can even increase speed even for large n

Other answers mention many advanced algorithms that combine many primitive algorithms to create a more efficient advanced algorithm, including:

  • Combining insertion sort and merge sort (in the case of Timsort)
  • Combining naive and Strassen's algorithms for matrix multiplication
  • Combining naive and Schönhage-Strassen algorithms for multiplying big numbers

Note that the better complexity, the worse runtime algorithms used here are recursive. That means they will call themselves on smaller bits of the data. That means that even if the size of the data structure is enough to make the better complexity algorithm faster, it will eventually reduce to one or more problems of a small size, even if n is initially big.

This means that even for large n where the recursive algorithm is initially used, a large performance benefit can be gained by switching to a faster algorithm once the problem size has been reduced enough to make it viable.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
mousetail
  • 7,009
  • 4
  • 25
  • 45
  • 4
    Matrix multiplication is such a good example of this that it would warrant its own answer. Not only is a mix of naive and Strassen's algorithms actually used in practice with a clause `if side < 100: naive; else: Strassen's`; but there exist several different algorithms with better asymptotic complexity than Strassen's, and none of them is used in practice. Naive is O(n^3), Strassen's is O(n^2.8), best-known is O(n^2.37) (where n is the side of the matrices, assuming the two matrices are square). – Stef Jan 15 '23 at 20:41
  • 2
    @Stef The algorithms which are better than Strassen are generally galactic algorithms : they have a so huge (hidden) constant in the complexity that they are basically useless even for pretty huge matrices on current computers. As for Strassen, AFAIK, most [BLAS](https://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms) libraries do not use it. They just use the naive `O(n**3)` algorithm, even for big matrices. This is because Strassen is only interesting for huge matrices (due to being less cache-friendly) and it is AFAIK *less numerically stable*. – Jérôme Richard Jan 22 '23 at 15:17
8

The other answers here have given plenty of examples, but I wanted to expand on them with one major reason high time complexity algorithms are sometimes used.

When talking about the space/time complexity of an algorithm, it's generally implied that we mean the worst-case, or perhaps average-case of the algorithm. However, real world datasets and problems often have exploitable patterns in them, where they might be closer to the best-case for an algorithm than its worst-case.

Consider a couple of specific examples:

  • Timsort is a hybrid sort that primarily uses merge sort which is relatively performant with a time complexity of O(n log n), but it also intentionally utilizes several much worse performance algorithms. The reason it is designed this way is that it acts like a state machine, observing whether any of the data being sorted is already somewhat ordered, and if so picks an appropriate algorithm that performs well in the presence of that kind of order. (See also adaptive sorting.)

  • Compression algorithms generally produce larger compressed sizes in their worst cases, but the data people actually want to compress isn't random; video, audio, text all have patterns that can be exploited that mean compression yields large space savings.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Blackhawk
  • 5,984
  • 4
  • 27
  • 56
  • 4
    And sometimes high-complexity algorithms are used because they've got better constant factors. For example, there's an O(n log n) algorithm for multiplying numbers, but the constant factors on that are so bad that it's impractical for any number small enough to be stored in a modern computer. – Mark Jan 13 '23 at 22:19
  • *"Timsort is a hybrid sort that uses several algorithms with relatively high worst-case complexity. However, it is specifically designed this way because real world data is often at least partly ordered already with runs of ordered elements, and so Timsort will be able to perform much closer to its best-case than its worst-case."* Do you have a source for this claim? I was sure Timsort's worst-case was still O(n log(n)), but you appear to be implying that it isn't? – Stef Jan 15 '23 at 20:36
  • @Stef good call, that was a very imprecise hand wave :P I've updated the answer - see if that's more reasonable – Blackhawk Jan 15 '23 at 21:39
  • 1
    @Blackhawk Your sentence *"combining these algorithms allows Timsort to perform much closer to its best-case than its worst-case for a lot of real world data."* still makes it sound like Timsort could reach a worst-case of n² on some "unlucky" data, but that's okay because it's O(n log(n)) on most "real-world" data. I don't think that"s true. I think Timsort is guaranteed to be O(n log(n)). – Stef Jan 16 '23 at 08:51
  • @Stef my point is that while Timsort relies on mergesort in the worst case, it leverages insertion sort (a worse worst-case) to take advantage of the inherent order ("runs") present in most datasets people care to sort. I assert that Timsort achieves vastly more than O(n log n), approaching best case in those datasets where partial order in the form of runs is present. – Blackhawk Jan 16 '23 at 16:05
  • @Stef See [Tim's original description of his algorithm](https://bugs.python.org/file4451/timsort.txt) - he has test data for both random data and for partially ordered data. "Now for the more interesting cases. lg(n!) is the information-theoretic limit for the best any comparison-based sorting algorithm can do on average (across all permutations). When a method gets significantly below that, it's either astronomically lucky, or is finding exploitable structure in the data." – Blackhawk Jan 16 '23 at 16:09
  • @Blackhawk But your answers appears to be saying that Timsort occasionally does run into a worst-case n² complexity. I don't believe that's the case. Yes, Timsort does sometimes do better than n log(n), but it also doesn't do worse than n log(n). – Stef Jan 16 '23 at 16:16
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/251178/discussion-between-blackhawk-and-stef). – Blackhawk Jan 16 '23 at 16:26
  • 1
    @Stef The question is, whether Insertion Sort is only used in Timsort in a way (only in cases), where it is running in `O(n * log n)`. The check for exploitable structures can also run in `O(n * log n)`. – Sebastian Jan 16 '23 at 16:49
  • @Sebastian see my updated answer - it's still a simplification, and you should definitely read the wikipedia article cause it's fascinating. Timsort is acting as a state machine, and it kind of does a "branch prediction" optimization where if it sees several elements in a row that are already ordered, it predicts that there could be several more, and attempts to exploit that knowledge to avoid having to naively sort that section directly, storing it as a run to be merged with other runs later. – Blackhawk Jan 16 '23 at 17:35
  • @Mark For *machine* integers, sure. But there are plenty of arbitrary-precision integer libraries that absolutely make use of things like the Karatsuba algorithm for large numbers. – chepner Jan 17 '23 at 21:57
  • @chepner, Karatsuba is O(n^1.58), which is rather slower than O(n log n). It beats the the O(n^2) "grade school" algorithm for n somewhere around 500 bits. The crossover point for the O(n log n) Harvey-van der Hoeven algorithm appears to be somewhere around 2^713739807325663489766475852620783120641 bits. – Mark Jan 17 '23 at 23:18
  • @Mark I never said Karatsuba was O(n log n), or that it was replacing another O(n lg n) algorithm. – chepner Jan 18 '23 at 13:14
  • What are Timsort's "**several** much worse performance algorithms"? I only know its insertion sort. – Kelly Bundy Feb 09 '23 at 06:50
  • @KellyBundy I suppose it's somewhat disingenuous of me to imply that the algorithms are all sorting algorithms. If you take a look at the wikipedia page for timsort [operation](https://en.wikipedia.org/wiki/Timsort#Operation), you'll see that in order to take advantage of partial ordering (ascending or descending), it also must do searching/reversing. If you were to run these same steps against random data, it would perform terribly if at all. The only reason these parts of timsort are performant is because of patterns present in most data people care to sort. – Blackhawk Feb 09 '23 at 15:14
  • Hmm, I don't get what you mean with *"If you were to run these same steps against random data"*. It's not like reversing random data isn't done because that would "perform terribly". It's not done because there's no sense to it. Same with searching, I guess, not sure which one you mean. – Kelly Bundy Feb 09 '23 at 15:47
  • @KellyBundy I mean that these steps are run specifically to achieve a sort, but it only works because of patterns present in the data, so if you were trying to sort random data with these steps it would either take a very long time to sort or flat out fail. Consider how a video compression algorithm might perform if run against text data - if it didn't flat out fail because of hard assumptions about the structure of the data, the compressed size would likely be larger than uncompressed. – Blackhawk Feb 09 '23 at 16:11
  • 1
    @KellyBundy ...From a computer science perspective, algorithms are measured based on their performance across all possible inputs, whereas practically speaking humans only care about a very small subset of all inputs (audio, video, text, structured records, partially sorted lists, etc.), and algorithms that perform well in the average case on specific subsets are often more useful in practice than generalized algorithms that minimize the worst case. – Blackhawk Feb 09 '23 at 16:16
  • 1
    @KellyBundy to quote my earlier comment, "Now for the more interesting cases. lg(n!) is the information-theoretic limit for the best any comparison-based sorting algorithm can do on average (across all permutations). When a method gets significantly below that, it's either astronomically lucky, or is finding exploitable structure in the data." - [Tim Peters](https://bugs.python.org/file4451/timsort.txt) – Blackhawk Feb 09 '23 at 16:20
6

In formal verification, we solve NP-complete, exponential and undecidable problems all the time (all of these using algorithms much worse than O(n²) (provided a potentially neverending search can be considered an algorithm)).

The compiler for the Ethereum smart contract language, Solidity, interfaces with the SMT solver Z3 in order to try to formally prove the asserts in the code will always hold, and that problem is undecidable.

The computer algebra algorithm F5 is exponential and was used to break HFE cryptosystems.

There are many more examples. It is less about the size of the input, and more about luck and the input having the kind of structure where the algorithm will not hit its worst case complexity (E.g., bubble sort is O(n²) in the worst case, but if the array is almost sorted in a certain way, it can run in linear time).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
lvella
  • 12,754
  • 11
  • 54
  • 106
  • 2
    Good examples, except for Bubble Sort; one element far out of place can make it take nearly the full N^2, or just N depending on which direction it's moving. A better example is Insertion Sort, which is always pretty close to linear for almost-sorted arrays. https://en.wikipedia.org/wiki/Bubble_sort#Rabbits_and_turtles / [Why bubble sort is not efficient?](https://stackoverflow.com/q/61997897) / [Bubble Sort: An Archaeological Algorithmic Analysis](https://users.cs.duke.edu/~ola/papers/bubble.pdf) – Peter Cordes Jan 14 '23 at 02:07
  • A problem is NP-complete. An algorithm is not NP-complete. If a particular algorithm's worst-case runtime is much worse than n², then it's going to remain much worse than n² regardless of whether P and NP are equal or not. – Stef Jan 15 '23 at 20:34
  • @Stef I think Ivella meant algorithms for NP complete problems?!? – Sebastian Jan 15 '23 at 22:07
  • @Stef What I meant is if you can find a O(n²) algorithm to a NP-complete problem, then P = NP, not the converse. – lvella Jan 16 '23 at 11:11
  • @Ivella May I suggest rewriting the first sentence of the answer, *"In formal verification we use NP-complete, exponential and undecidable algorithms all the time (all of these much worse than O(n²) (provided P != NP))."*, so that it's less wrong? – Stef Jan 16 '23 at 11:16
5

In most of NP-type problems, you have to use approximation algorithms or brute-force/inefficient algorithms which have high time complexities.

There are lots of divide and conquer algorithms that depend on input and the complexity may vary based on type of the input that are given like in sorting, searching, arithmetics, etc.

Most algorithms take inputs and inputs do vary indeed, so basically every real-world algorithm that are being used are made of smaller algorithms that do well on specific type of inputs. And there are going to be research and evolvements on those specific algorithms that improves the complexity those algorithms, but you also have to realize that time complexity is not always a good way of measuring the speed of your algorithm since it’s just putting a limit and a relation on the growth of it when going to infinity (not counting smaller constants or the way those O(1) instruction are made).

Older implementations of quicksort use insertion sort, because it’s efficient for small data sets which is better than most other simple quadratic algorithms, such as selection sort or bubble sort.

There is another use cases which are hard to compute by design choices such as cryptocurrency mining algorithms and CPU cycle measuring...

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
4

This answer doesn't fully answer the question in that it is a ways away from software development. However, it describes cases beyond almost trivial sorting algorithm optimizations already described in other answers.

In competitive programming, there are problems involving combining two algorithms that go beyond constant factor optimizations, and result in a better time complexity, usually from O(N^2) to O(N^1.5).

This material (Chapter 27, section 2) describes one such simplified case very well in which one has a grid with a number of colors on it and the question being for each color how far is the minimum distance between two cells of the same color. A naive approach involves running a BFS for each color with a time complexity of O(number of colors * N) which reduces to O(N * N) (in the case that the grid contains colors only once).

Another involves for each pair of cells of a color finding their distance and taking the minimum with a time complexity of the sum of (k * k) across all colors which can be shown to reduce to O(N * N) in the worst case (the entire grid is the same color). A clever method of selecting only some colors to run the first algorithm on and using the second on others is described in the text to reach a complexity of O(N*sqrt(N)).

For those who want to challenge themselves, here is a problem that uses a similar idea to the one described in the material with a more complex data structure. The solution can be found here.

2

One reason for not writing code like if (n < 1000) alg_1 else alg_2 is that you need to develop and test two algorithms and also need to check that they perform exactly the same under all circumstances.

Since the question states that algorithm time is a critical factor, testing might be a very time-consuming matter. It might be a good choice to just take the algorithm that gives you the best overall performance. It is also a trade-off between developer efficiency and algorithm efficiency.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
OldFrank
  • 888
  • 6
  • 7
  • 1
    The need for testing to ensure *correctness* has nothing to do with algorithmic complexity at *runtime*, though. I, as a user, do not have to confirm that insertion sort and merge sort are both correct every time I use Timsort in a program. – chepner Jan 17 '23 at 22:03
  • I provided a general answer to the general problem in the question. Many other answers including yours are algorithm-specific. In some specific cases where one algorithm is well-proven you might be able to skip a few steps. With more complexity and more time needed for testing, the more challenging correctness becomes. – OldFrank Jan 18 '23 at 11:51