What would be the best way to build a Big-O runtime complexity analyzer for pseudocode in a text file?

Question

I am trying to create a class that takes in a string input containing pseudocode and computes its' worst case runtime complexity. I will be using regex to split each line and analyze the worst-case and add up the complexities (based on the big-O rules) for each line to give a final worst-case runtime. The pseudocode written will follow a few rules for declaration, initilization, operations on data structures. This is something I can control. How should I go about designing a class considering the rules of iterative and recursive analysis? Any help in C++ or Java is appreciated. Thanks in advance.

class PseudocodeAnalyzer
{
  public: 
  string inputCode;
  string performIterativeAnalysis(string line);
  string performRecursiveAnalysis(string line);
  string analyzeTotalComplexity(string inputCode);
}

An example for iterative algorithm: Check if number in a grid is Odd:
1. Array A = Array[N][N] 
2. for i in 1 to N
3.   for j in 1 to N
4.    if A[i][j] % 2 == 0
5.      return false
6.    endif
7.   endloop
8. endloop
Worst-case Time-Complexity: O(n*n)

@RobertHarvey I had seen that post earlier. I understand that it is not possible to analyze the exact complexity of the program. I am looking to build something for say 15-20 lines of pseudocode with the best approximation — Maharshi, Dec 03 '20 at 16:09

score 3 · Accepted Answer · answered Dec 03 '20 at 16:24

The concept: "I wish to write a program that analyses pseudocode in order to print out the algorithmic complexity of the algorithm it describes" is mathematically impossible!

Let me try to explain why that is, or how you get around the inevitability that you cannot write this.

Your pseudocode has certain capabilities. You call it pseudocode, but given that you are now trying to parse it, it's still a 'real' language where terms have real meaning. This language is capable of expressing algorithms.

So, which algorithms can it express? Presumably, 'all of them'. There is this concept called a 'turing machine': You can prove that anything a computer can do, a turing machine can also do. And turing machines are very simple things. Therefore, if you have some simplistic computer and you can use that computer to emulate a turing machine, you can therefore use it to emulate a complete computer. This is how, in fundamental informatics, you can prove that a certain CPU or system is capable of computing all the stuff some other CPU or system is capable of computing: Use it to compute a turing machine, thus proving you can run it all. Any system that can be used to emulate a turing machine is called 'turing complete'.

Then we get to something very interesting: If your pseudocode can be used to express anything a real computer can do, then your pseudocode can be used to 'write'... your very pseudocode checker!

So let's say we do just that and stick the pseudocode that describes your pseudocode checker in a function we shall call pseudocodechecker. It takes as argument a string containing some pseudocode, and returns a string such as O(n^2).

You can then write this program in pseudocode:

1. if pseudocodechecker(this-very-program) == O(n^2)
2.   If True runSomeAlgorithmThatIsO(1)
3.   If False runSomeAlgorithmTahtIsO(n^2)

And this is self-defeating: We have 'programmed' a paradox. It's like "This statement is a lie", or "the set of all sets that do not contain themselves". If it's false it is true and if it is true it false. [Insert GIF of exploding computer here].

Thus, we have mathematically proved that what you want is impossible, unless one of the following is true:

A. Your pseudocode-based checker is incorrect. As in, it will flat out give a wrong answer sometimes, thus solving the paradox: If you feed your program a paradox, it gives a wrong answer. But how useful is such an app? An app where you know the answer it gives may be incorrect?

B. Your pseudocode-based checker is incomplete: The official definition of your pseudocode language is so incapable, you cannot even write a turing machine in it.

That last one seems like a nice solution; but it is quite drastic. It pretty much means that your algorithm can only loop over constant ranges. It cannot loop until a condition is true, for example. Another nice solution appears to be: The program is capable of realizing that an answer cannot be given, and will then report 'no answer available', but unfortunately, with some more work, you can show that you can still use such a system to develop a paradox.

Thank you for explaining it in detail. A little too ambitious of me to try to build a tool like this. — Maharshi, Dec 03 '20 at 16:36
its just fair, that if the mathematicans can't get it straight, CS also can't :P — 463035818_is_not_an_ai, Dec 03 '20 at 16:58

Mo B. · Answer 2 · 2020-12-04T09:34:47.763

The answer by @rzwitserloot and the ones given in the link are correct. Let me just add that it is possible to compute an approximation both to the halting problem as well as to finding the time complexity of a piece of code (written in a Turing-complete language!). (Compare that to the existence of automated theorem provers for arithmetic and other second order logics, which are undecidable!) A tool that under-approximated the complexity problem would output the correct time complexity for some inputs, and "don't know" for other inputs.

Indeed, the whole wide field of code analyzers, often built into the IDEs that we use every day, more often than not under-approximate decision problems that are uncomputable, e.g. reachability, nullability or value analyses.

If you really want to write such a tool: the basic idea is to identify heuristics, i.e., common patterns for which a solution is known, such as various patterns of nested for-loops with only very basic arithmetic operations manipulating the indices, or simple recursive functions where the recurrence relation can be spotted straight-away. It would actually be not too hard (though definitely not easy!) to write a tool that could solve most of the toy problems (such as the one you posted) that are given as homework to students, and that are often posted as questions here on SO, since they follow a rather small number of patterns.

If you wish to go beyond simple heuristics, the main theoretical concept underlying more powerful code analyzers is abstract interpretation. Applied to your use case, this would mean developing a mapping between code constructs in your language to code constructs in a different language (or simpler code constructs in the same language) for which it is easier to compute the time complexity. This mapping would have to conform to some constraints, in particular, the mapped constructs have have the same or worse time complexity as the original code. Actually, mapping a piece of code to a recurrence relation would be an example of abstract interpretation. So is replacing a line of code with something like "O(1)". So, the task is just to formalize some of the things that we do in our heads anyway when we are analyzing the time complexity of code.

Thank you for your answer. Each contribution is making it clearer for me to understand why such a tool doesn't already exist. If you think my question doesn't warrant a -1 then please upvote so a few CS enthusiasts like me aren't discouraged from asking questions. Thanks once again — Maharshi, Dec 04 '20 at 15:04

What would be the best way to build a Big-O runtime complexity analyzer for pseudocode in a text file?

2 Answers2