1

I have the basics down on computational complexity theory. I can understand why I might want to scale with one algorithm compared to another. Now that I'm there, how do I actually determine the complexity of the function I've created? How do I understand which functions to use, which one will scale better? How, for example, will I know that the Telephone Book binary search takes O(log n) notation, or that the Fibonacci sequence takes O(n^2) notation, outside of trial and error? How do I determine the complexity of a function in, for example, scikit-learn?

How do I actually apply this stuff?

Yehuda
  • 1,787
  • 2
  • 15
  • 49

3 Answers3

0

The tasks performed in scikit-learn are highly computational that is why it is recommended to get a good GPU for running ML/DS related tasks. All of these tasks are run using cores/threads parallelly.

So It is hard to determine what are the actual complexities of these functions are, but what we can do is test run and check how much time it takes for a given length of the input.

Refer here for better understanding.

Rishabh Deep Singh
  • 807
  • 1
  • 12
  • 24
0

Basically, you need to calculate the number of operations needed by the algorithm depending on the input size. The Big O is then just the highest order part of the expression with constant factors ignored.

One does not care about the kind of operations (comparison, assignment, ...) as long as the time for the operation is constant.

For complex algorithms, that analysis can be difficult.

For binary search: with each search step, the number of values to be searched is reduced into its half. So twice the input requires one more search step (operation). t(2n) = 1 + t(n). This results in t(n) = c ld n = O(log n), at least for the powers of two. For the other n, the expression is more complex but the highest order part is still c ld n.

For Fibonacci: the naive, recursive implementation requires you to calculate fib(n-1) and fib(n-2) for calculating fib(n). Hence you calculate fib(n-2) twice, fib(n-3) three times, fib(n-4) five times and so on (following the Fibonacci series itself). So the number of calculations to do is 1 + 1 + 2 + 3 + 5 + ... + fib(n - 1) = fib(n) - 1. Since we are interested in the asymptotic behavior (for big n), we can apply the asymptotic approximation formula:

formula

This means naive, recursive Fibonacci is O(a^n), i.e. exponential complexity.

A better algorithm starts from the beginning of the Fibonacci series and calculates each number once. That's obviously O(n), as it takes n (or n - 2) equal steps.

undur_gongor
  • 15,657
  • 5
  • 63
  • 75
  • 1
    Please revisit your analysis of a naive Fibonacci. As written it is plain wrong. Its correct complexity is exponential, not quadratic. – user58697 Nov 10 '20 at 18:45
0

scikit-learn is heavily (but not only: a kd-tree for example is probably a more computer-science like algorithm) based on numerical-optimization and a classic computer-science focused treatment of computational-complexity is surely needed, but not enough.

For something like an interior-point solver or coordinate-descent based SVM-solver (two example of concepts behind ML-algorithms), both are iterative methods like nearly everything in num-opt, it's not needed to know how fast you can do binary-search, but more important to know, how many iterations your algorithm will need or more specific: how your algorithm moves through the optimization-space. This is pretty tough, depends on the data (e.g. eigenvalues of the hessian of the cost-function) and the proofs / analysis are heavily math-based: e.g. metric-theory.

Things are even worse when heuristics are in play (and those are very common).

So basically: you won't be able to do this for reasonable complex algorithms.

What you should do:

  • check the docs / sources which algorithm is used and find the underlying research-paper and the analysis to obtain something like "cubic in sample-space"
  • do empirical analysis with your data
sascha
  • 32,238
  • 6
  • 68
  • 110